FredHutch / wiki

SciWiki: Collective KnowledgeBase for Scientific Data and Use
https://sciwiki.fredhutch.org
Other
37 stars 46 forks source link

_datascience directory created on build #913

Open chrisequalsdev opened 1 year ago

chrisequalsdev commented 1 year ago

After building, dir ./_site/_datascience is created which is a duplicate of ./_site/datascience. These are the only underscore dirs in the ./_site/ dir that are not asset related. This create some duplicate pages in the site.

For example: normal page underscore page

This creates two issues which may or may not really matter:

  1. Duplicate pages (search results might lead to the one that doesn't show a sidebar)
  2. Creates some issues for the broken link checking effort (although for automation purposes the ./_site/_datascience dir can be ignored with bundle exec htmlproofer ./_site --ignore-files "/\.\/_site\/_datascience\/.*/")

The broken links are because of the link to #main which exists in the base template and links to the top of the content on each page. This link doesn't work on underscore pages.

A normal example looks like this: normal page normal page main content

Underscore page example: underscore page underscore page main content

Basically I think the ./_site/_datascience needs to go. I'm not sure why that gets created.

Sidenote: The same link issue also exists in three other pages too. I haven't looked into these yet. https://sciwiki.fredhutch.org/archive/cluster_koshuBeta/index.html https://sciwiki.fredhutch.org/archive/galaxy-on-prem/index.html https://sciwiki.fredhutch.org/contributorTemplate/index.html

vortexing commented 1 year ago

This spot in the config might help us - perhaps we should explicitly EXCLUDE more folders? https://github.com/FredHutch/wiki/blob/0d6f96c7e751731cbbaa2f214487280e4214a6bf/_config.yml#L109

laderast commented 2 months ago

More research needed

dtenenba commented 1 week ago

A very crude workaround for this would be to explicitly remove site/_datascience as part of the build process. I guess we should check if any pages link to content there.

A related question is whether the search engine should be indexing orphaned content like this, if there are no links to it.