Closed erikrikarddaniel closed 4 years ago
the list of representatives is in a file called sp_clusters. tsv (not gtdb_metadata.tsv), my script reads representatives directly from sp_clusters.tsv and does not use gtdb_metadata.tsv
OK. Then we need to add download of this file in the data/Makefile
.
We need to download it and publish to results also. Maybe we can add this file to the GetMetadata process in main.nf? Or should it have its own process?
Since this is not part of the workflow it only needs to be downloaded by the build process in this repo. (Another question is whether it would be better as part of the workflow, but lets leave it as it is now.)
added to Makefile
Add an R script (i.e. not an Rmd) to
scripts
that reads all the feather files and thegtdb_metadata.tsv
and outputs new feather files that only contain data referring to the species representative genomes.