Closed svengato closed 1 year ago
It does look like the current hard-coded genome page.
I have one concern:
The amount of hand-coding between _data/genomes.yml and genome/index-sgr.html is at least as much as we have now in current genome/index.html. Hopefully, it is not a perception problem on my part.
I had imagined the data would come from the DS metadata repo for Arachis/... instead of hard-coding that here again in the Jekyll site.
If your attempt now is an intermediate step in that direction, that is okay. Otherwise, constructing the genome page sourcing from DS metadata repo is the ideal goal. Please let me know if I am misinterpreting (very likely on my part) the work here.
You are correct, it was an attempt to parametrize the current site. I will look at the data store metadata next.
Do you mean the genome README files, like https://data.legumeinfo.org/Arachis/hypogaea/genomes/BaileyII.gnm1.1JTF/README.BaileyII.gnm1.1JTF.yml ?
It should be sourced from datastore metadata repo: https://github.com/legumeinfo/datastore-metadata/tree/main/Arachis genus level resources: https://github.com/legumeinfo/datastore-metadata/tree/main/Arachis/GENUS/about_this_collection
Species level (hypogaea as example): https://github.com/legumeinfo/datastore-metadata/blob/main/Arachis/hypogaea/about_this_collection/description_Arachis_hypogaea.yml
And similarly for other spp under Arachis. This is where the taxa page resources come from.
Are the metadata in the data store copied from there?
My understanding: For the species and genus resources we should edit the corresponding about-this-collection dir file in the datastore metadata repo for adding and modifying what should appear in the jekyll pages.
Whether details of the README of a genome version sources it from there is a different question. (Many of these readme files in my understanding are much older than the datastore-metadata repo and there would be overlap, but we don't source it from there for Jekyll taxa pages). This aspect is a question for @adf-ncgr .
Something like this?? https://dev.peanutbase.org/genome/index-sgr-2.html
This comes from the local _data/taxa/Arachis/genus_resources.yml, but I could probably get them from https://github.com/legumeinfo/datastore-metadata/blob/main/Arachis/GENUS/about_this_collection/description_Arachis.yml (which is almost identical), and add species resources.
Andrew generates the taxa level autocontents after cloning the DS-metadata repo locally in the VM, outside the Jekyll location. I think we can leverage the same data. I don't know where it is stored during autocontent generation. But below is what he wrote about it to me.
Andrew's Process: (06/29 email) Hi Sudhansu- here are some notes I had taken about the process; I think it assumes you are running these commands outside of the /var/www/jekyll-peanutbase/ area with a clone of the datastore-metadata in the working directory. The autocontent directory is created by default to store the results of running that command. It also assumes you have the lis-autocontent in your PATH (e.g. by pip installing lis-autocontent). We can discuss further in today's meeting if of interest.
lis-autocontent populate-jbrowse2 --jbrowse_url /tools/jbrowse2 --taxa_list taxon.yml --cmds_only > populate-jbrowse2.err 2>&1
sudo cp autocontent/Arachis/* /var/www/jekyll-peanutbase/_data/taxa/Arachis
#if a new species was added, update /var/www/jekyll-peanutbase/_data/species_list.yml
#that will also require rebuild of all site content as:
sudo rm -rf _site
The text should be as much as like the current index.html if that is possible.
I will get to that later - still playing around with displaying available fields. https://dev.peanutbase.org/genome/index-sgr-2.html
I'm not following this thread closely, but a couple of quick comments:
ought to exclude non-genome-browser resources from this page (ie anything but GBrowse/JBrowse/JBrowse2);
Right, genome page for genome links only, that's what I meant by
The text should be as much as like the current index.html .....
Right now Sven is:
I will get to that later - still playing around with displaying available fields. https://dev.peanutbase.org/genome/index-sgr-2.html
I guess I could imagine arguments for including GCV and ZZBrowse, given that they also browse genomes (albeit in somewhat specialized fashion). But it's your call on that. Also note that we could extend the YAML content if needed (such as adding attributes to indicate whether a resource was to be included as a genome browser, as we discussed previously).
Current version: https://dev.peanutbase.org/genome/index-sgr.html Let me know what you think.
For grouping by genome version, it would help if each resource (genome, annotation, etc) in species_resources and species_collections had a genome version number, so that we did not have to parse its name/description to figure it out.
is_genome_browser: Currently it tests whether the description contains "Browse". Alternatively, we could group by a resource_type field ('assembly', 'annotation', 'genome browser', etc).
No GenBank resources exist in these YAML files, so I left those out.
Had to include the hard coded additional A. hypogaea text, of course.
Just a reminder, this is ready for feedback.
Current version: https://dev.peanutbase.org/genome/index-sgr.html Let me know what you think.
Looked at ....jekyll-peanutbase/genome/index-sgr.html and the result https://dev.peanutbase.org/genome/index-sgr.html with respect to _data/taxa/Arachis/species_resources.yml
It definitely serves the purpose of generating genome page via autocontent for PeanutBase.
Next steps:
I think we are ready to start a conversation on how to specify items in metadata files for use by the genome page. (Thus PB-Jekyll meeting, let us talk among ourselves and then propose in next group meeting(Legumista or LIS-PB for LIS in general)
Done. Note that the introductory text ("This page describes ... cultivated peanut itself.") and the A. hypogaea details ("Changes between the ... Additional details about ... in the chromosomes") are hard-coded for now.
https://www.peanutbase.org/genome/ has the dev site banner
not sure we need to keep a separate banner for the dev site, but if we do, we shouldn't let it bleed into prod.
It looks like this only appears in production (on the Species page as well). I will try to fix them.
Rebuilding the production site made the "Development" go away.
There is now an alternative Peanut Genome page, generated on the fly from
_data/genomes.yml
with the templategenome/index-sgr.html
. Let me know if that is on the right track.Current hard-coded Peanut Genome page, for comparison.
Optional fields:
details
,strain
,annotation
.