Request of read number instead of relative taxa abundance for clr normalization

borenstein-lab / microbiome-metabolome-curated-data

MIT License

56 stars 24 forks source link

Request of read number instead of relative taxa abundance for clr normalization #5

Closed MaximilianBaumgartner closed 9 months ago

MaximilianBaumgartner commented 9 months ago

I would like to apply clr transformation on the taxonomic data for calculation of Aitchison distance and fastspar network generation.

In scripts/data_organization/load_original_data/load_data_DATASET.R the original taxonomy/kraken files containing reads instead of relative abundances are referenced, i.e:

'../data/original_data/DATASET/kraken/kraken_species_level_taxonomy.tsv'

Would it be possible to make them available to allow clr normalization, which is not possible on relative abundance values. This would significantly reduce computation time and be much appreciated !

Cheers, Max

efratmuller commented 9 months ago

Hi Max,

I've added read count tables to each dataset folder. You'll now see a genera.tsv file that holds relative abundances as before, and genera.counts.tsv files with read counts. Same for species where available. There have also been small updates in mappings of metabolite identifiers in HMDB/KEGG so expect slight differences in the mtb.map files as well. Let me know if you find any problem.

Cheers, Efrat

MaximilianBaumgartner commented 9 months ago

Hi Efrat,

wow thank you so much, I will def. write you in the acknowledgments when we write this up in a paper !

All the best, Max