Performing differential OTU using nanoclust data

kcmtest commented 3 years ago

The pipeline ran after removing the samples which was resulting in error 
executor >  local (6046)
[a2/6ffd4c] process > QC (69)                         [100%] 73 of 73 ✔
[c6/b1e07e] process > fastqc (73)                     [100%] 73 of 73 ✔
[0b/655df7] process > kmer_freqs (67)                 [100%] 73 of 73 ✔
[1c/1ca964] process > read_clustering (71)            [100%] 73 of 73 ✔
[79/e62849] process > split_by_cluster (73)           [100%] 73 of 73 ✔
[a9/c29c19] process > read_correction (1046)          [100%] 1048 of 1048 ✔
[27/40b6e0] process > draft_selection (1048)          [100%] 1048 of 1048 ✔
[3a/e5d1e2] process > racon_pass (1048)               [100%] 1048 of 1048 ✔
[8b/936ff4] process > medaka_pass (1048)              [100%] 1048 of 1048 ✔
[a2/d78685] process > consensus_classification (1048) [100%] 1050 of 1050, failed: 2, retries: 2 ✔
[42/532a74] process > join_results (73)               [100%] 73 of 73 ✔
[b3/746fed] process > get_abundances (73)             [100%] 73 of 73 ✔
[db/dd86da] process > plot_abundances (292)           [100%] 292 of 292 ✔
[84/796ce6] process > output_documentation            [100%] 1 of 1 ✔
[nf-core/nanoclust] Pipeline completed successfully
WARN: [nf-core/nanoclust] Could not attach MultiQC report to summary email
Completed at: 20-May-2021 20:14:01
Duration    : 1h 57m 23s
CPU hours   : 97.6 (0% failed)
Succeeded   : 6'044
Failed      : 2

Previously i ran kraken2 where I would generate OTU table from various class and then perform differential OTU using deseq2 as it was raw counts.

How to do the same with the nanoclust output? It gives relative abundances.

Any suggestion how to go about this

mansi-aai commented 1 year ago

@kcmtest Did you find the way to get OTU count table ? I need that table also to get alpha and beta diversity. Thank you !

timyerg commented 8 months ago

If someone still wonders: In the *_nanoclust_out.txt "reads_in_cluster" column is the column used for calculating relative abundances at the species level. It can be used for alpha/beta diversity and differential abundance tests if they require raw/absolute counts (LEfSe is fine with relative). To get it for lower ranks, one can add full taxonomy by columns and collapse counts.

Here is the function that was used by developers to calculate relative abundance:

def get_abundance_values(names,paths):
    dfs = []
    for name,path in zip(names,paths):
        data = pd.read_csv(path, index_col=False, sep=';').iloc[:,1:]

        total = sum(data['reads_in_cluster'])
        rel_abundance=[]

        for index,row in data.iterrows():
            rel_abundance.append(row['reads_in_cluster'] / total)

        data['rel_abundance'] = rel_abundance
        dfs.append(pd.DataFrame({'taxid': data['taxid'], 'rel_abundance': rel_abundance}))
        data.to_csv("" + name + "_nanoclust_out.txt")

    return dfs

genomicsITER / NanoCLUST

Performing differential OTU using nanoclust data #37