bdaisley / isolateR

Automated processing of Sanger sequencing data, taxonomic profiling, and generation of microbial strain libraries
Other
9 stars 1 forks source link

FASTA as input for `isoTAX`? #9

Closed lxsteiner closed 1 week ago

lxsteiner commented 1 month ago

Hi,

This looks like such a useful wrapper for handling large collections of Sanger sequences, thank you for publishing it!

I've read the tutorial and manual, but was wondering if it would be possible to pass entries made only from FASTA sequences (and not .ab1 files) as an output of isoQC into isoTAX? Or what could a possible workaround be to still input samples where only FASTA sequences exist (e.g. make up mock ab1 quality values, make them into .ab1 files, and process it in isoQC)?

The motivation being, that in-house we of course have .ab1 files from which FASTA sequences were eventually extracted and worked with for tax. identification and etc. But in order to have sequences from other labs used in the same collection/pipeline (e.g. for taxonomic identity), only FASTA sequences are usually available and made public.

It would be great if this were possible all within isolateR, otherwise it's again a chore to process own samples with .ab1 files here, make collections, export FASTA, add external FASTA collections, redo taxonomic identifications with whatever tool, summarize taxonomy on your own.

Do you see any possible workaround at the moment or possibly implementing a similar feature in the future?

Thanks.

bdaisley commented 1 month ago

@lxsteiner - Thanks for the feedback, this is a great idea and very much doable. I will add a proper feature within the next week. For an immediate workaround, the mock up ab1 quality values as you mentioned would work. Just add your sequences in an exisiting isoQC formatted file, then add mock data for missing columns, and you should be able to continue onward to the isoTAX > isoLIB steps as usual.

bdaisley commented 1 month ago

Hi @lxsteiner, just following up on this. I've adjusted the isoTAX function in the latest isolateR package release to allow for input of FASTA files, as requested. Brief overview as follows:

Example walkthrough

Update to latest version of isolateR

if ("package:isolateR" %in% search()) {detach("package:isolateR", unload=TRUE)}
devtools::install_github("bdaisley/isolateR")

Example case using FASTA file containing 16S rRNA genes from human gut isolates

Manual download link for FASTA example: human_gut_isolates_10.fasta

#Download example FASTA file:
download.file("https://github.com/bdaisley/isolateR/raw/main/inst/extdata/fasta_examples/human_gut_isolates_10.fasta", 
              destfile="T:/human_gut_isolates_10.fasta")

#Run isoTAX with FASTA file as input (Note: 'quick_search=FALSE' recommended for real use scenario)
isoTAX(input="T:/human_gut_isolates_10.fasta", quick_search=TRUE)

The above commands will generate the following output files:

Optional: Use mock isoQC table as input to isoTAX instead

Manual changes can be incorporated into the mock isoQC table and then re-run with isoTAX. This may be desirable if you want to add custom quality values or other metadata not directly accessible from a raw FASTA file.

isoTAX(input="T:/isolateR_output/01_isoQC_mock_table.csv", quick_search=TRUE)

If nothing was edited in the isoQC mock table, this last line of code will functionally lead to the same output as with using the FASTA file directly.

I hope these additions are helpful. Please let me know if any further adjustments are needed!