PeanutBase / jekyll-peanutbase

A "starter" Jekyll site that uses the jekyll-theme-legumeinfo submodule
Apache License 2.0
0 stars 0 forks source link

Generate Peanut Genome page from YAML file #45

Closed svengato closed 1 year ago

svengato commented 1 year ago

There is now an alternative Peanut Genome page, generated on the fly from _data/genomes.yml with the template genome/index-sgr.html. Let me know if that is on the right track.

Current hard-coded Peanut Genome page, for comparison.

Optional fields: details, strain, annotation.

sdash-github commented 1 year ago

It does look like the current hard-coded genome page. I have one concern: The amount of hand-coding between _data/genomes.yml and genome/index-sgr.html is at least as much as we have now in current genome/index.html. Hopefully, it is not a perception problem on my part.
I had imagined the data would come from the DS metadata repo for Arachis/... instead of hard-coding that here again in the Jekyll site.
If your attempt now is an intermediate step in that direction, that is okay. Otherwise, constructing the genome page sourcing from DS metadata repo is the ideal goal. Please let me know if I am misinterpreting (very likely on my part) the work here.

svengato commented 1 year ago

You are correct, it was an attempt to parametrize the current site. I will look at the data store metadata next.

svengato commented 1 year ago

Do you mean the genome README files, like https://data.legumeinfo.org/Arachis/hypogaea/genomes/BaileyII.gnm1.1JTF/README.BaileyII.gnm1.1JTF.yml ?

sdash-github commented 1 year ago

It should be sourced from datastore metadata repo: https://github.com/legumeinfo/datastore-metadata/tree/main/Arachis genus level resources: https://github.com/legumeinfo/datastore-metadata/tree/main/Arachis/GENUS/about_this_collection

Species level (hypogaea as example): https://github.com/legumeinfo/datastore-metadata/blob/main/Arachis/hypogaea/about_this_collection/description_Arachis_hypogaea.yml

And similarly for other spp under Arachis. This is where the taxa page resources come from.

svengato commented 1 year ago

Are the metadata in the data store copied from there?

sdash-github commented 1 year ago

My understanding: For the species and genus resources we should edit the corresponding about-this-collection dir file in the datastore metadata repo for adding and modifying what should appear in the jekyll pages.

Whether details of the README of a genome version sources it from there is a different question. (Many of these readme files in my understanding are much older than the datastore-metadata repo and there would be overlap, but we don't source it from there for Jekyll taxa pages). This aspect is a question for @adf-ncgr .

svengato commented 1 year ago

Something like this?? https://dev.peanutbase.org/genome/index-sgr-2.html

This comes from the local _data/taxa/Arachis/genus_resources.yml, but I could probably get them from https://github.com/legumeinfo/datastore-metadata/blob/main/Arachis/GENUS/about_this_collection/description_Arachis.yml (which is almost identical), and add species resources.

sdash-github commented 1 year ago
  1. genome/index-sgr-2.html is approaching there. The text should be as much as like the current index.html if that is possible.
  2. We don't have genome browser and DS genome assembly+annotation links at genus level; they are at the level of each species.
  3. But the idea of sourcing it from a DS-metadata file is correct.

Andrew generates the taxa level autocontents after cloning the DS-metadata repo locally in the VM, outside the Jekyll location. I think we can leverage the same data. I don't know where it is stored during autocontent generation. But below is what he wrote about it to me.

Andrew's Process: (06/29 email) Hi Sudhansu- here are some notes I had taken about the process; I think it assumes you are running these commands outside of the /var/www/jekyll-peanutbase/ area with a clone of the datastore-metadata in the working directory. The autocontent directory is created by default to store the results of running that command. It also assumes you have the lis-autocontent in your PATH (e.g. by pip installing lis-autocontent). We can discuss further in today's meeting if of interest.

lis-autocontent populate-jbrowse2 --jbrowse_url /tools/jbrowse2 --taxa_list taxon.yml --cmds_only > populate-jbrowse2.err 2>&1  

sudo cp autocontent/Arachis/* /var/www/jekyll-peanutbase/_data/taxa/Arachis
#if a new species was added, update /var/www/jekyll-peanutbase/_data/species_list.yml
#that will also require rebuild of all site content as:
sudo rm -rf _site
svengato commented 1 year ago

The text should be as much as like the current index.html if that is possible.

I will get to that later - still playing around with displaying available fields. https://dev.peanutbase.org/genome/index-sgr-2.html

adf-ncgr commented 1 year ago

I'm not following this thread closely, but a couple of quick comments:

sdash-github commented 1 year ago

ought to exclude non-genome-browser resources from this page (ie anything but GBrowse/JBrowse/JBrowse2);

Right, genome page for genome links only, that's what I meant by

The text should be as much as like the current index.html .....

Right now Sven is:

I will get to that later - still playing around with displaying available fields. https://dev.peanutbase.org/genome/index-sgr-2.html

adf-ncgr commented 1 year ago

I guess I could imagine arguments for including GCV and ZZBrowse, given that they also browse genomes (albeit in somewhat specialized fashion). But it's your call on that. Also note that we could extend the YAML content if needed (such as adding attributes to indicate whether a resource was to be included as a genome browser, as we discussed previously).

svengato commented 1 year ago

Current version: https://dev.peanutbase.org/genome/index-sgr.html Let me know what you think.

  1. For grouping by genome version, it would help if each resource (genome, annotation, etc) in species_resources and species_collections had a genome version number, so that we did not have to parse its name/description to figure it out.

  2. is_genome_browser: Currently it tests whether the description contains "Browse". Alternatively, we could group by a resource_type field ('assembly', 'annotation', 'genome browser', etc).

  3. No GenBank resources exist in these YAML files, so I left those out.

  4. Had to include the hard coded additional A. hypogaea text, of course.

svengato commented 1 year ago

Just a reminder, this is ready for feedback.

Current version: https://dev.peanutbase.org/genome/index-sgr.html Let me know what you think.

sdash-github commented 1 year ago

Looked at ....jekyll-peanutbase/genome/index-sgr.html and the result https://dev.peanutbase.org/genome/index-sgr.html with respect to _data/taxa/Arachis/species_resources.yml

It definitely serves the purpose of generating genome page via autocontent for PeanutBase.

Next steps:
I think we are ready to start a conversation on how to specify items in metadata files for use by the genome page. (Thus PB-Jekyll meeting, let us talk among ourselves and then propose in next group meeting(Legumista or LIS-PB for LIS in general)

svengato commented 1 year ago

Done. Note that the introductory text ("This page describes ... cultivated peanut itself.") and the A. hypogaea details ("Changes between the ... Additional details about ... in the chromosomes") are hard-coded for now.

adf-ncgr commented 1 year ago

https://www.peanutbase.org/genome/ has the dev site banner

image

not sure we need to keep a separate banner for the dev site, but if we do, we shouldn't let it bleed into prod.

svengato commented 1 year ago

It looks like this only appears in production (on the Species page as well). I will try to fix them.

svengato commented 1 year ago

Rebuilding the production site made the "Development" go away.