DaliangNing / iCAMP1

Infer Community Assembly Mechanisms by Phylogenetic bin-based null model analysis (Version 1)
GNU General Public License v2.0
67 stars 25 forks source link

Is it necessary to rarify the count data before starting iCAMP analysis ? #49

Closed sankadinesh closed 6 months ago

sankadinesh commented 9 months ago

Hi there, Is it necessary to rarify the count data based on sample containing lowest number of sequences before starting iCAMP analysis ?

Regards, Dinesh

DaliangNing commented 9 months ago

If you cannot estimate absolute abundances of taxa in each sample (e.g., do not know total individuals per gram soil, or bacterial biomass or cell counts per gram soil), resampling, i.e., rarefy every sample to the same sequencing depth (sum of counts), is necessary to make different samples comparable and make ecological null models applicable.

If you can estimate absolute abundances, it is more reasonable to use absolute abundances. My suggestion: Once you get the raw ASV or OTU table (not rarefied to the same sequencing depth), calculate a certain resampling depth for each sample. This resampling depth should be proportional to the total copies of the marker gene per gram (total abundance sum per gram sample) in this sample. Then, resample, e.g., use rrarefy in vegan package. So, after resampling, the ratio of abundance sums between two samples is equal to the ratio of the copies per g in the two samples. Then, you get a matrix with only integers (suitable for null models), and the value of each ASV (or OTU) in each sample reflects the absolute abundance of the ASV (or OTU).

sankadinesh commented 8 months ago

Dear Dalian, Thanks for your reply. I did rarefaction of ASV table based on read depth and used it calculating absolute abundances. I used this absolute abundance as such for the icamp analysis. Is this approach correct ?

Regards, Dinesh

DaliangNing commented 8 months ago

To perform the rarefaction reflecting 'absolute abundance', you should determine the targeted sequence number of each sample based on the total biomass or total cell number or total marker gene copies of the microorganisms you targeted per gram sample. The rarefaction is not simply based on read depth. If you are sure the rarefied reads of each ASV in each sample can reflect its absolute abundance in each sample, yes, it is good to use.

sankadinesh commented 8 months ago

Thanks a lot.

It would be great to hear your opinion on the strategy I am using for absolute abundance for icamp analysis.

I converted RAW ASV data to qpcr corrected data through below mentioned steps. I divided the qpcr corrected data by 10^6 to decrease the absolute abundance values per ASV (which also decreases the sample wise sum) since sum values more than 10^12 can cause trouble in icamp as well other analysis (picrust2).

RAW ASV data -> step1: 16s copy numbers (ASV wise 16s copy numbers) normalisation -> step2: qPCR normalisation (sample wise copy number values), qpcr corrected data -> divide by 10^6 -> rarefaction to sample with lowest number of absolute abundance: 429036 -> Input for icamp

Thank you once again.

Regards, Dinesh

DaliangNing commented 6 months ago

need to make sure after rarefaction, the sums of different samples are not the same, but proportional to their total copy numbers that represent the total abundance sums in different samples,