OurTechCommunity / catchup

The OTC CatchUp web app and workflows.
https://catchup.ourtech.community
MIT License
15 stars 9 forks source link

feat: Infinite Scroll for `/summary` page #72

Open KartikSoneji opened 2 years ago

KartikSoneji commented 2 years ago

Make the /summary page initially load only the last 3 summaries, and dynamically fetch more as the user scrolls.

sreekaransrinath commented 1 year ago

Will make it a pain in the ass to scrape ;-;

KartikSoneji commented 1 year ago

a. Who is scraping the catchup page? b. The idea is to expose an api, which people can use instead of scraping.

HarshKapadia2 commented 1 year ago

a. Who is scraping the catchup page?

We've already had one project that does it (https://github.com/mihikagaonkar/OTC-Dashboard), so let us not make any assumptions and keep things open for the future.

b. The idea is to expose an api, which people can use instead of scraping.

That is an alternative, but it requires additional effort. What is your plan for this API? How much detail will it include? Will it send over the entire file or will it provide options to get dates, durations and other specific parts of the content? (This API will also act as a blocker if we have to change any file formatting in the future, as we will have to handle different scenarios of file formattings to be parsed and returned.) Also more importantly, how would we let someone who wanted to scrape our pages know that we have such a feature available?

KartikSoneji commented 1 year ago

a. Who is scraping the catchup page?

We've already had one project that does it (https://github.com/mihikagaonkar/OTC-Dashboard), so let us not make any assumptions and keep things open for the future.

There was no need to scrape the website, all the data was availabe in the repo.

b. The idea is to expose an api, which people can use instead of scraping.

That is an alternative, but it requires additional effort. What is your plan for this API?

No, that is a side effect of implementing infinite scroll. The endpoint that will be called to get the next set of summaries will be the same one that someone might use to scrape them.

How much detail will it include? Will it send over the entire file or will it provide options to get dates, durations and other specific parts of the content?

Just the <section> tags that currently contain each summary in the combined summary page.

-e, --embedded Output an embeddable document, which excludes the header, the footer, and everything outside the body of the document. This option is useful for producing documents that can be inserted into an external template. We shouldn't need a new parser, just the -e flag.

Also more importantly, how would we let someone who wanted to scrape our pages know that we have such a feature available?

Hmm maybe add a page, but most likely someone who wants to scrape the page will analyze network requests. Or ask us about it.

But in general, there are very few reasons to scrape the summaries from the website. If someone wants to run static analysis, individual files in the repo are better for that. The only other reason might be to integrate with another website, but in that case an api would be easier.

HarshKapadia2 commented 1 year ago

Makes sense. Thank you.

We should add a note somewhere for scrapers though, just to inform them about the API. (Maybe in the API response?) We will also have to document the API.