frederic-mahe / mumu

C++ implementation of lulu, a R package for post-clustering curation of metabarcoding data
GNU General Public License v3.0
7 stars 0 forks source link

warning: one of these is not in the OTU table (#2) #9

Closed hjarnek closed 2 weeks ago

hjarnek commented 3 weeks ago

Hi @frederic-mahe,

I know there is a similar closed issue, but this case seems to be a different problem.

I have exported an ASV table from DADA2 into a txt file that looks like this:

    sample1 sample2 sample3
ASV1    10186   12137   10657
ASV2    8142    9047    9449
ASV3    3793    5887    4494
...

Neither column names nor row names are quoted, and there is a single leading \t on the first line.

There are no whitespaces or non-ASCII characters in the ASV names, just "ASV" followed by a number. The FASTA file looks like this:

>ASV1
ACTAAGTAATG...
>ASV2
GATAGCTTCAA...
>ASV3
ACTGTCGTCAG...
...

And the match list is produced with VSEARCH as suggested in the mumu manual:

ASV3    ASV1201 96.8
ASV1    ASV938  95.5
ASV6    ASV265  87.5
...

Still, when I run it (mumu -o COI.fasta -m match.list -l COI_log.txt -n COI_seqtab_mumu.txt) I get warnings like warning: one of these is not in the OTU table: ASV1404 ASV1243 85.3 for every post in the match list. The log file comes out empty. Do you have any idea what's going on? mumu 1.0.2 in native Linux environment.

frederic-mahe commented 2 weeks ago

Hi @hjarnek sorry for the delay, I was on vacation.

I've created a small example with the same characteristics as your dataset (mainly, an empty first cell, which mumu accepts) but I could not reproduce your issue:

    sample1
ASV1    10
ASV2    1
ASV1    ASV2    96.8
mumu \
    --otu_table <( printf "\tsample1\nASV1\t10\nASV2\t1\n") \
    --match_list <(printf "ASV1\tASV2\t96.8\n") \
    --new_otu_table /dev/null \
    --log /dev/null

mumu outputs no warning and merges the two ASVs:

parse OTU table... done, 2 entries
parse match list... done
sort lists of matches... done
search for potential parent OTUs... done
merge OTUs... done
update spread values... done
write new OTU table... done, 2 entries

The error message you get indicates that there is somehow a difference between the ASV names in the table and in the match list. I am not sure what that difference could be (non-printable characters maybe?). You could check your files with an hexadecimal viewer such as xxd or od:

printf "ASV1\tASV2\t96.8\n" | xxd | head
00000000: 4153 5631 0941 5356 3209 3936 2e38 0a    ASV1.ASV2.96.8.

Or if you could share your input files with me, I could run my own tests to find whats wrong.

hjarnek commented 2 weeks ago

Frédéric, I'm sorry to bother you, I must have been too tired when I ran mumu, as I have apparently input the FASTA file in place of the ASV table, as also seen in my original post above actually. Your program is working like a charm of course. I'm closing this "issue" now before someone else sees it 🤦‍♂️

frederic-mahe commented 2 weeks ago

Ah, mumu -o COI.fasta ..., I should have spotted it too. Thanks for trying mumu!

frederic-mahe commented 2 weeks ago

Hi @hjarnek I've implemented a test that warns users when the OTU table does not seem to contain samples (e.g.; not a table). That should at least help users understand what's wrong with their data. The test is now included in mumu (see https://github.com/frederic-mahe/mumu/commit/136d738e46534ff51451f75a56e148b9a1abc8eb).

hjarnek commented 1 week ago

Great! I promise I won't make this mistake again ;)