KoslickiLab / YACHT

A mathematically characterized hypothesis test for organism presence/absence in a metagenome
MIT License
28 stars 7 forks source link

PR for MAG fishing #122

Open bioinfwithjudith opened 2 months ago

bioinfwithjudith commented 2 months ago

This is a draft. I would like to add a description on the paper from where this data came from. I had some issues along the way and might be worth noting for new users.

codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 85.18%. Comparing base (1b9297b) to head (edb95cf).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #122 +/- ## ======================================= Coverage 85.18% 85.18% ======================================= Files 11 11 Lines 1080 1080 ======================================= Hits 920 920 Misses 160 160 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

dkoslicki commented 2 months ago

Thanks! Let me know when it's ready for review

bioinfwithjudith commented 2 months ago

Updated description for README.md

Question: NCBI accession: PRJNA924243 and has >50 MAG samples. Currently, the README.md instructs to concatenate these MAGS to work as input for YACHT.

sonarcloud[bot] commented 2 months ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

dkoslicki commented 2 months ago

Hmmm... @bioinfwithjudith I might need a bit more context. What do you mean by "50 MAG samples"? Do you mean MAGs derived from 50 different metagenomes, or 50 different assemblies from a single genome? If the later, then concatenating them would make sense if you wanted to answer "have any of these MAGs shown up in this other sample?", but keeping them separate would be "is this particular MAG occurring in this other sample?" So it just depends on what you are trying to answer

bioinfwithjudith commented 2 months ago

@dkoslicki

I see that I am confused.

The paper states "a non-redundant dataset of 58 was compiled, the majority of which (35) belonged to (Alpha- and Gamma-) Proteobacteria" so I am assuming the latter.

I thought I was answering the MAG fishing question but I am not. Ideally, I would want to download the metagenomes sampled for this study, take a MAG and identify the metagenomes that have said MAG.

I have the MAGs downloaded but I am in the process of figuring out how to download metagenomic samples from MG-RAST.