Sage-Bionetworks / sage-monorepo

Where OpenChallenges, Schematic, and other Sage open source apps are built
https://sage-bionetworks.github.io/sage-monorepo/
Apache License 2.0
23 stars 12 forks source link

Evaluate crawling capability of Sage websites, portals and apps #493

Closed tschaffter closed 1 year ago

tschaffter commented 2 years ago

The idea is to crawl these sites to identify whether they are optimized for SEO. We will use the crawler recently added to this repo.

Main:

Portals:

Shiny apps:

tschaffter commented 2 years ago

Configuration

const generator = SitemapGenerator(siteUrl, {
  filepath: sitemapFilepath,
  maxDepth: 2,
  maxEntriesPerFile: 50000,
  stripQuerystring: false
});

https://sagebionetworks.org

node tools/generate-sitemap.js http://www.sagebionetworks.org sitemap-sagebionetworks.xml

Number of <url>: 29

https://synapse.org

node tools/generate-sitemap.js http://www.synapse.org sitemap-synapse.xml

Number of <url>: 1

https://agora.adknowledgeportal.org

node tools/generate-sitemap.js https://agora.adknowledgeportal.org sitemap-agora.xml

Number of <url>: 1

https://adknowledgeportal.synapse.org

node tools/generate-sitemap.js https://adknowledgeportal.synapse.org sitemap-adknowledgeportal.xml

Number of <url>: 1

https://nf.synapse.org

node tools/generate-sitemap.js https://nf.synapse.org sitemap-nf.xml

Number of <url>: 1

https://csbc-pson.synapse.org

Note https://csbc-pson.synapse.org redirects to https://cancercomplexity.synapse.org/

node tools/generate-sitemap.js https://cancercomplexity.synapse.org sitemap-cancercomplexity.xml

Warning The crawler does not save the sitemap file when using https://csbc-pson.synapse.org (because of the redirection?).

Number of <url>: 1

https://www.cri-iatlas.org

node tools/generate-sitemap.js https://www.cri-iatlas.org sitemap-iatlas.xml

Number of <url>: 6

https://isb-cgc.shinyapps.io/iatlas

node tools/generate-sitemap.js https://isb-cgc.shinyapps.io/iatlas sitemap-iatlas-shiny.xml

Warning The crawler does not save the sitemap file (here there is no redirection).

tschaffter commented 2 years ago

About the crawler not saving sitemap file

tschaffter commented 2 years ago

Terminology

Results

Valid on 2022/08/04.

Site Pages found SSR enabled sitemap.xml Technology Contact
https://synapse.org 1 Google Web Toolkit Jay Hodgson
https://sagebionetworks.org 29 WordPress
https://www.cri-iatlas.org 6 WordPress
https://challenge-registry.org 3 Angular Thomas Schaffter
https://agora.adknowledgeportal.org 1 Angular Anna Greenwood
https://adknowledgeportal.synapse.org 1 React Jay Hodgson
https://nf.synapse.org 1 ✅ (empty) React Jay Hodgson
https://cancercomplexity.synapse.org 1 React Jay Hodgson
https://isb-cgc.shinyapps.io/iatlas NA* Shiny Andrew Lamb

*The crawler used seems to be unable to crawl Shiny apps.

Conclusions

References