Use server-side rendering in case wrtc peers can't be reached

dat-ecosystem-archive / datBase

Open data sharing powered by Dat [ DEPRECATED - More info on active projects and modules at https://dat-ecosystem.org/ ]

http://datbase.org

244 stars 32 forks source link

Use server-side rendering in case wrtc peers can't be reached #103

Closed okdistribute closed 8 years ago

okdistribute commented 8 years ago

There are many ways to do this, and we could experiment. The essential problem is that dat.land only looks for client-side connections (using web rtc) right now, but there are many problems with web rtc:

unreliable
slow
browser-based

We can use dat.haus to do this server-side fetching for us, which is a composable api to the dat network https://github.com/juliangruber/dat.haus if wrtc connection can't be reached. There are many ways we could try to prioritize connections:

make webrtc and dat.haus calls at the same time. Whichever comes back first wins.
give webrtc 3-5 seconds before calling out to dat.haus
always prioritize dat.haus connections if we can get one.
use dat.haus to fetch particular information (tarballs, large files, etc) and webrtc for small things

Before we integrate this into dat.land, we probably should look into how much we care about bandwidth/storage costs on dat.haus

laurengarcia commented 8 years ago

Don't forget that server-render gets SEO, if that's something that is a priority.

mafintosh commented 8 years ago

If the main concern is supporting most browsers and compatibility with the non-webrtc network a solution would be to run websocket <--> discovery-swarm gateways and use a websocket join the swarm locally in addition to webrtc.

Some nice features about this

Anybody can run a gateway. Because the connection is encrypted a gateway wont know what data it is sharing. This is not true for a http proxy like dat.haus. This means that a university can help us out by running one.
Supports all browsers
Trivial to integrate (websockets are just streams using websocket-stream
Will work offline

For SEO you'd wanna cache the metadata feed server side (a dat with the history, but no content) and render that.

juliangruber commented 8 years ago

+1 to the websocket transport, that's a great idea.

I can see a bunch of implications though with server side rendering for SEO: Google crawls a lot, basically forcing dat.haus to replicate everything eventually. Also Google pulls response time into their ranking, so an archive that currently isn't seeded might lower dat's overall ranking.

Do we think those are concerns to us? AH! Maybe only do server side rendering for archives that are paid to be kept alive?

laurengarcia commented 8 years ago

Yeah, I can't imagine doing server-side rendering without some kind of cache, was imagining it would be for "partner" level users/orgs that want their data found.

laurengarcia commented 8 years ago

ogd: "on github when you click a subfolder it makes a new page request for that subfolders view. so i guess we would do the same, as opposed to sending the file metadata tree down to the client and client rendering it"

laurengarcia commented 8 years ago

Another note on this (per IRC conversation with Karissa, Max and I on 7/22): Prioritization order of data fetching:

server-render dat metadata for a dataset by sync'ing/proxying via sparse: true
- each subdir of a dat gets its own server-render (similar to github)
after initial server-render of a dat, the dat.land browser app will look for webrtc peers and render data from peers if user initiates actions that require more data (i.e. paginating thru a .csv file). fallbacks strategies that have been discussed:
- if there are no webrtc peers to be found, fall back to the sync'ed/proxied dat feed from the server via xhr?
- OR use a socket to join the swarm vs webrtc as mafintosh suggested above

okdistribute commented 8 years ago

Yeah, this sounds great. Also having server-rendering forces us to have nice url schemes so people can link to particular file or subdir.

laurengarcia commented 8 years ago

[UPDATE] @maxogden: @karissa @Kriesse decided to just go with client-rendered archive file and sub-folder views within each archive page. That means no server-rendered dat.land/<archiveId>/<subfolder> routes.

okdistribute commented 8 years ago

i think this larger issue is captured by a bunch of smaller issues, going to close this