MaestSi / MetONTIIME

A Meta-barcoding pipeline for analysing ONT data in QIIME2 framework
GNU General Public License v3.0
78 stars 17 forks source link

Minimal-but-complete set of example files, say, below 1 MB, to work with the supplied config file? #93

Closed richelbilderbeek closed 8 months ago

richelbilderbeek commented 8 months ago

Dear MetONTIIME maintainers and fellow users,

To get MetONTIIME running, it would be useful to get a small yet full example dataset to work with the supported config file.

Currently, Zymo-GridION-EVEN-BB-SN_sup_pass_filtered_27F_1492Rw_1000_reads.fastq.gz is supplied (good job!), but what is lacking is:

Is there anyone that has a small but usable dataset to help getting MetONTIIME running?

Thanks and cheers, Richel

MaestSi commented 8 months ago

Hi,

Simone

richelbilderbeek commented 8 months ago

The example dataset linked by the README at the 'test-dataset' header at URL https://nanopore.s3.climb.ac.uk/Zymo-GridION-EVEN-BB-SN_signal.tar is 260 GB big. I'd say that does not qualify as a minimal-but-full example dataset :-)

Likewise for the others...

Anyone knows a minimal-but-full example dataset, as in below 1 MB?

MaestSi commented 8 months ago

The example test dataset is indeed this one. SM

richelbilderbeek commented 8 months ago

Thanks @MaestSi, indeed this is one of the file. As a set of files to be able to run MetONTIIME, however, as it misses:

Anyone knows a minimal-but-complete example dataset, below 1 MB (but hey, 10 would be fine too) so anyone can test-run MetONTIIME?

MaestSi commented 8 months ago

For example, you can use a fasta file with B. subtilis 16S sequence:

>NR_116192.1 Bacillus subtilis strain NRRL NRS-744 16S ribosomal RNA, partial sequence
GGATAACTCCGGGAAACCGGGGCTAATACCGGATGGTTGTTTGAACCGCATGGTTCAAACATAAAAGGTG
GCTTCGGCTACCACTTACAGATGGACCCGCGGCGCATTAGCTAGTTGGTGAGGTAACGGCTCACCAAGGC
AACGATGCGTAGCCGACCTGAGAGGGTGATCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGG
GAGGCAGCAGTAGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGAGTGATGAAGG
TTTTCGGATCGTAAAGCTCTGTTGTTAGGGAAGAACAAGTACCGTTCGAATAGGGCGGTACCTTGACGGT
ACCTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCC
GGAATTATTGGGCGTAAAGGGCTCGCAGGCGGTTTCTTAAGTCTGATGTGAAAGCCCCCGGCTCAACCGG
GGAGGGTCATTGGAAACTGGGGAACTTGAGTGCAGAAGAGGAGAGTGGAATTCCACGTGTAGCGGTGAAA
TGCGTAGAGATGTGGAGGAACACCAGTGGCGAAGGCGACTCTCTGGTCTGTAACTGACGCTGAGGAGCGA
AAGCGTGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAG
GGGGTTTCCGCCCCTTAGTGCTGCAGCTAACGCATTAAGCACTCCGCCTGGGGAGTACGGTCGCAAGACT
GAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAA
GAACCTTACCAGGTCTTGACATCCTCTGACAATCCTAGAGATAGGACGTCCCCTTCGGGGGCAGAGTGAC
AGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCT
TGATCTTAGTTGCCAGCATTCAGTTGGGCACTCTAAGGTGACTGCCGGTGACAAACCGGAGGAAGGTGGG
GATGACGTCAAATCATCATGCCCCTTATGACCTGGGCTACACACGTGCTACAATGGACAGAACAAAGGGC
AGCGAAACCGCGAGGTTAAGCCAATCCCACAAATCTGTTCTCAGTTCG

And the corresponding taxonomy file: NR_116192.1 Bacteria;Bacillota;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus subtilis; (the separator between the id and the taxonomy should be a tab) That should be enough for checking everything is working. Best, SM

richelbilderbeek commented 8 months ago

Awesome, thanks!

richelbilderbeek commented 8 months ago

I reopen this Issue as the set of files is still incomplete: it results in the error Ordinations with less than two dimensions are not supported.

I will try to add a line to the FASTA or taxonomy file and see if that solve the problem :innocent:

MaestSi commented 8 months ago

I guess that is due to the fact that beta-diversity analysis is not possible with a single sample. You may consider splitting the fastq in 2 chunks and see if that solves the issue, or setting diversityAnalyses process to false in the config file. SM

richelbilderbeek commented 8 months ago

I reopen this Issue as the set of files is still incomplete, after or setting diversityAnalyses process to false: it results in the error (see in GitHub Actions log here):

All features were filtered, resulting in an empty table

I tried to alleviate this by setting clusteringIdentity to 0.1and minConsensus to 0.51, but still no run that passes.

So, anyone knows a minimal-but-complete example dataset, below 1 MB (but hey, 10 would be fine too) so anyone can test-run MetONTIIME, ideally allowing the diversity analysis?

MaestSi commented 8 months ago

It looks like the reads do not align to Bacillus subtilis 16S gene sequence that I downloaded from NCBI. You may check this by uploading to QIIME2 Viewer the file taxonomy.qzv. If this is the case, you may consider downloading more fasta sequences from NCBI and creating the tsv files with the script I provided - or just using a real database, which you can easily download from QIIME2 website.