Open jessicalumian opened 1 year ago
Hello I have some questions!
and I have answers!
1. How can I modify the gather hashes vs mapped bp graph to show more than 60 genomes?
The reports are generated from template notebooks in genome_grist/notebooks
that are filled in and executed. The filled in notebooks are available in outputs.*/reports/*.ipynb
, and you can actually run them directly from there and modify them.
In this case you want report-mapping-{sample}.ipynb
. You should be able to modify the number 60 at the top of it = see NUM=60
.
If there are things we can do to make this notebook easier to edit let me know :). Haven't paid much attention to it in a while...
2. How can I see the number of hashes that don't match anything in GTDB?
See outputs.*/{sample}.yaml
. The unknown_hashes
is what you want. See also total_hashes
and known_hashes
.
3. Can I get the answers to 1 and 2 if I am using the GTDB database and providing another database in the same run?
The numbers will be calculated with respect to the combined databases.
Bonus question:
Is there a way to easily find out the amount of genome covered of a specific genome for different runs of genome-grist? Say I am looking for microbe X in five different microbiome samples and I want to know how many hashes match microbe X and what percentage of genome is covered in those samples. I imagine I could look at the report graphs but wondering if there's another way.
hmm. ...yes... if I understand your question correctly...
outputs.*/gather/{sample}.gather.csv
will contain the sourmash/hash information. You're looking for one of the columns f_orig_query,f_match,f_unique_to_query,f_unique_weighted,average_abund,median_abund,std_abund,f_match_orig,unique_intersect_bp
for the row where name
matches your desired microbe.
For the mapping coverage, look at outputs.*/mapping/{sample}.summary.csv
. You're looking for f_covered_bp
.
There are some details - like whether you want the stats for the metagenome x genome, or leftover metagenome x genome - but first I'd suggest that you go get confused by what's there and then come back and ask questions ;)
p.s. great questions!
Hello I have some questions!
Bonus question:
Is there a way to easily find out the amount of genome covered of a specific genome for different runs of genome-grist? Say I am looking for microbe X in five different microbiome samples and I want to know how many hashes match microbe X and what percentage of genome is covered in those samples. I imagine I could look at the report graphs but wondering if there's another way.