Prevent web crawlers from fetching unnecessary data on /doi.org, /ror.org, and /orcid.org pages

datacite / akita

DataCite Commons

https://commons.datacite.org

MIT License

6 stars 3 forks source link

Prevent web crawlers from fetching unnecessary data on /doi.org, /ror.org, and /orcid.org pages #316

Closed bklaing2 closed 7 months ago

bklaing2 commented 7 months ago

Purpose

Currently, when web crawlers access doi, ror, and orcid pages, they needlessly fetch all of the data. This will prevent them from fetching the data for related works.

Approach

The page will now check the headers to see if the userAgent is a web crawler. If it is, it will use a lightweight version of the graphQL call, which doesn't include the data for related works

Open Questions and Pre-Merge TODOs

Learning

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)

Reviewer, please remember our guidelines:

Be humble in the language and feedback you give, ask don't tell.
Consider using positive language as opposed to neutral when offering feedback. This is to avoid the negative bias that can occur with neutral language appearing negative.
Offer suggestions on how to improve code e.g. simplification or expanding clarity.
Ensure you give reasons for the changes you are proposing.

cypress[bot] commented 7 months ago

2 flaky tests on run #1000 ↗︎

Details:

Merge d3457a0cec4808aed1211006185fee981a4de355 into 359507163b7e3eb1922274df2033...
Project: akita	Commit: `b7ca110445 ℹ️`
Status: Passed	Duration: 02:31 💡
Started: Jan 16, 2024 12:35 PM	Ended: Jan 16, 2024 12:37 PM

search.test.ts • 1 flaky test • Tests

View Output

Test		Artifacts
... > search with enter		`Screenshots`

statistics.test.ts • 1 flaky test • Tests