NCEAS / metacatui

MetacatUI: A client-side web interface for DataONE data repositories
https://nceas.github.io/metacatui
Apache License 2.0
41 stars 26 forks source link

Investigate the feasibility of not using # fragments in URLs #431

Closed amoeba closed 6 years ago

amoeba commented 6 years ago

We've talked about this as a group numerous times over the years and the need to reconsider this has cropped up recently due to some GeoLink work @rushirajnenuji and I are doing. We think MetacatUI dataset landing pages (#view/{PID}) are not indexed by Google (at least in part) due to the use of hash fragments in our URLs. Another possibility may have to do with the use of a client-side web app though my understanding is that Google has sufficient ability to index client-side webapps as of a few years ago.

Concerns:

Specific tasks here include:

mbjones commented 6 years ago

I think one other reason that Google may not index is lack of a sitemap.xml file that lists all of the dataset landing pages. Google could in theory follow the search results page links, but it may not properly parse those. Not sure. Metacat has a sitemap generator which probably needs to be updated to reflect our new landing page URI formats, but is likely adequate to the task of generating a full suite of dataset lists. See edu.ucsb.nceas.metacat.Sitemap.java for details.

laurenwalker commented 6 years ago

Thanks for looking into this. I agree it would be a nice improvement. We should make sure we do all the development for this in a branch.

We are using Backbone 1.3.3 in MetacatUI right now, which I believe is the latest version; so no upgrade needed.

csjx commented 6 years ago

When we first started MetacatUI, there was not enough support for the HTML5 pushState() method for the browser history, so we punted and left the hashtag in the URL. I think there's plenty of support now, sow we can use the Router.history.start({pushState: true}) and manage client side routes. Of course, figuring out the redirects for consistency like @amoeba said above is the trick.

rushirajnenuji commented 6 years ago
Determine the cause(s) of Google not indexing MetacatUI's dataset landing pages:
amoeba commented 6 years ago

Thanks for the notes @rushirajnenuji. So does it seem pretty feasible to switch URLs like

https://search.dataone.org/#view/knb.92138.1

to

https://search.dataone.org/view/knb.92138.1

?

What do you think about making the change on a branch (maybe just to the #view route), deploying it somewhere, and then pointing Google's Webmaster Tools at it to check that it works before we get much further along?

rushirajnenuji commented 6 years ago

@amoeba yes, sure. I think we should give it a try. I'll create a branch and start working on it.

laurenwalker commented 6 years ago

I compiled a list of the browser versions our visitors use on arcticdata.io and search.dataone.org. I cut off after 1%, so this is not a full list.

This should be helpful for checking if we can support the majority of our users with the pushState() change.

search.dataone.org, Jan 2017 - Jan 2018

Browser Browser Version Sessions % Sessions New Users
Chrome 60 902 9.49% 445
Chrome 56 898 9.44% 443
Chrome 61 827 8.70% 438
Chrome 59 757 7.96% 368
Chrome 58 633 6.66% 335
Chrome 55 546 5.74% 283
Chrome 57 463 4.87% 229
Chrome 62 444 4.67% 210
Firefox 52 281 2.96% 151
Internet Explorer 11 280 2.94% 235
Firefox 54 207 2.18% 112
Firefox 53 194 2.04% 120
Firefox 51 193 2.03% 104
Firefox 50 170 1.79% 78
Safari 10 170 1.79% 105
Firefox 56 156 1.64% 102
Firefox 55 154 1.62% 100
CrossrefEventDataBot (not set) 143 1.50% 142
Chrome 63 116 1.22% 70
Firefox 57 100 1.05% 67
Internet Explorer 8 97 1.02% 97

arcticdata.io, Jan 2017 - Jan 2018

Browser Browser Version Sessions % Sessions New Users
Chrome 56 2,108 11.24% 533
Chrome 58 1,770 9.44% 357
Chrome 57 1,613 8.60% 241
Chrome 59 1,499 7.99% 231
Chrome 60 1,448 7.72% 298
Chrome 61 1,289 6.87% 308
Chrome 55 997 5.32% 232
Safari 10 768 4.10% 371
Chrome 62 732 3.90% 177
Firefox 51 425 2.27% 164
Internet Explorer 11 414 2.21% 280
Firefox 54 395 2.11% 120
Firefox 52 385 2.05% 193
Firefox 53 292 1.56% 146
Chrome 63 250 1.33% 58
Firefox 57 245 1.31% 78
Firefox 56 210 1.12% 105
Firefox 55 188 1.00% 93
Firefox 50 177 0.94% 102

Full spreadsheet here: https://docs.google.com/spreadsheets/d/1QbcngOLWGLB58Nisa9RJM7sblCQeN2DSHRTkVdd11e0/edit?usp=sharing

amoeba commented 6 years ago

Thanks for compiling that list, @laurenwalker. That looks quite promising for us to use pushState, and showing users without that functionality enabled/available to the app a notice.

amoeba commented 6 years ago

@rushirajnenuji and I are still working on this testing whether it's feasible to make this change in the router at all. He made the initial change to the codebase on a branch and he and I are currently figuring out the necessary Apache config.

I did some testing and it seems like nothing (so far) in MetcatUI is problematic but we are struggling to find the right Apache config to make stateless entry work (e.g., be able to enter the app at /view/{PID}. I think we just have conflicting Apache config directives so I don't expect this to be too hard to address.

amoeba commented 6 years ago

I think I figured out the Apache config issues preventing part of this from working, though my understanding of Apache configs is apparently not good enough to know why.

Now I'm struggling with Require.js and its interaction with the rewrite rules. When this code is called,

require(['views/MetadataView']...

the request that's sent is to

https://handy-owl.nceas.ucsb.edu/view/js/views/MetadataView.js?v=2.0.0RC4

while the path the file exists on is just /js/views/MetadataView.js?v=2.0.0RC4. Require.js has configuration and it looks like this might be overridable.

amoeba commented 6 years ago

Great progress: I figured out that the issues was simple in that Require.js assumes relative URLs are relative to your URL (duh, right?). So I went through and made every URL referencing JSS/CSS/etc I could find absolute (js/ => /js) and so on. Also had to make a tweak to the MetadataView but otherwise this appears to be sorta working: https://handy-owl.nceas.ucsb.edu/view/urn:uuid:3ec5f245-ebe8-4870-bc0c-73d5f2ec7a87

This is pretty close to where it can be discussed with the group.

csjx commented 6 years ago

The relative URLs are relative to the baseUrl attribute stated in the call to requirejs.config(). Maybe the docs can help with this, but it sounds like sticking with relative URLs helps when you use their optimizer, which we have a ticket for.

Anyway, is the Router matching the /view/{pid} correctly now that you're using pushState in that branch?

amoeba commented 6 years ago

The relative URLs are relative to the baseUrl attribute stated in the call to requirejs.config().

That's not quite what I was seeing, but we could take a look. They ended up being relative to the current URL path. Since we are artificially putting the user on /view/{pid} or /profile/{profile}, a Require.js call like require(['components/x.js']) results in a call in the browser to /views/components/requires.js which is not where the JS file is. Changing everything to absolutes is the only fix I've found works so far.

That level of monkeying around worries me so we still have more to do see if this would work and should be implemented.

Anyway, is the Router matching the /view/{pid} correctly now that you're using pushState in that branch?

Yes, that works (had to change code in the route). I'll push my work up so we can see the diff.

amoeba commented 6 years ago

So we can look together and get some feedback, I've pushed a commit to @rushirajnenuji 's support-shareable-urls branch with what I think is the minimum required set of changes to make this work: https://github.com/NCEAS/metacatui/commit/059bb0332592a77f8b2e642839cc776d2bbae966

Note: Making of some URLs absolute paths was unavoidable. This presents a liability in that we either:

  1. Have to carefully adjust all these things when we deploy MetacatUI to different locations relative to the top level of the server
  2. Consider dynamically generating all of these URLs so configuration only happens in one place
amoeba commented 6 years ago

We talked about this on our dev call today and it was mentioned that I never shared the Apache config. We took a basic Apache config and I had to add:

<Directory "/var/www/handy-owl.nceas.ucsb.edu">
  Options Indexes MultiViews FollowSymLinks
  AllowOverride None
  Require all granted

  <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /
    RewriteRule ^index\.html$ - [L]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.html [L]
    </IfModule>
</Directory>

I thought I had to have a RewriteCond that ignored the metacat context but I'm not seeing that in my config and it works. To be investigated further.

amoeba commented 6 years ago

Re: bullet points 1 and 2 in the second above comment, we discussed strategies for this on our last dev call and one of those strategies was to factor our the deployment path into an app-wide config and use that everywhere the app tries to request static files (images, js, css).

I..

So this appears to be a go. I know @csjx is refactoring this part of of the app so I'll hold off on this for now.

amoeba commented 6 years ago

Great news: Googlebot can render MetacatUI MetadataView pages perfectly AFAICT.

amoeba commented 6 years ago

Wrote this work and the work on #432 up into a Google Doc so we could discuss on a dev call and also get early feedback from Dave V: https://docs.google.com/document/d/1yU-d-aFdtiSB91Wk0sFj1xthW8skBPof9qKwx00bjxE/edit?usp=sharing

Got some really good feedback, but no comments indicating a change in course here.

amoeba commented 6 years ago

I was doing some searching around while continuing this work today and it looks like Apache 2.2+ has FallbackResource which does all of the work of the ModRewrite chunk I had in the test config before.

Before:

    <IfModule mod_rewrite.c>
        RewriteEngine On
        RewriteBase /metacatui
        RewriteRule ^index\.html$ - [L]
        RewriteCond %{REQUEST_FILENAME} !-f
        RewriteCond %{REQUEST_FILENAME} !-d
        RewriteRule . /metacatui/index.html [L]
    </IfModule>

After

FallbackResource /index.html

🎉

laurenwalker commented 6 years ago

Awesome! Can it be restricted to only the url path in which MetacatUI is deployed? (So that 404 errors are returned for other nonexistent resources.)

Lauren

On Wed, Jun 6, 2018, 6:33 PM Bryce Mecum notifications@github.com wrote:

I was doing some searching around while continuing this work today and it looks like Apache 2.2+ has FallbackResource http://httpd.apache.org/docs/trunk/mod/mod_dir.html#fallbackresource which does all of the work of the ModRewrite chunk I had in the test config before.

Before:

<IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /metacatui
    RewriteRule ^index\.html$ - [L]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /metacatui/index.html [L]
</IfModule>

After

FallbackResource /index.html

🎉

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NCEAS/metacatui/issues/431#issuecomment-395235025, or mute the thread https://github.com/notifications/unsubscribe-auth/AGVeFtBaP5oTkRquvd12bdbdpJZa_dtPks5t6FixgaJpZM4RSZff .

amoeba commented 6 years ago

Yeah. I'm testing locally with metacatui deployed in a subdir like:

<Directory "/usr/local/var/www/metacatui">
    FallbackResource /metacatui/index.html
</Directory>

and it all seems to work. The config will still need to hard-code where MetacatUI is deployed here and also at least once within the app itself. Still figuring out how this will work and work well.

amoeba commented 6 years ago

Closing as there's no work left to be done. Implementation work is being done in other Issues under the https://github.com/NCEAS/metacatui/labels/routing%20refactor label.