bokulich-lab / RESCRIPt

REference Sequence annotation and CuRatIon Pipeline
BSD 3-Clause "New" or "Revised" License
92 stars 26 forks source link

orient-seqs: accept FASTQ data as input #159

Open nbokulich opened 1 year ago

nbokulich commented 1 year ago

vsearch --orient can accept FASTQ as input (and also output via the --fastqout option). Ideally, orient-seqs (which is just thinly wrapping vsearch --orient) could do the same.

This would require modifying the inputs/outputs here: https://github.com/bokulich-lab/RESCRIPt/blob/master/rescript/orient.py#L55

HOWEVER, the main issue I see is that the current inputs and outputs are DNAFASTAFormat objects. A FASTQ-formatted input (e.g., coming from some of the SampleData[.*Sequence.*] types) could not have DNAFASTAFormat as a view type. I suppose we need something like a Union[SingleLanePerSamplePairedEndFastqDirFmt | DNAFASTAFormat | ... ] as input and output, and a TypeMap in the plugin registration to accept and output the corresponding types.

colinvwood commented 1 year ago

Links to motivating forum posts here and here for reference.

mirand863 commented 1 year ago

Hi,

Thank you very much for opening this issue! I made some progress today. It is working with a single end FASTQ file using the type MultiplexedSingleEndBarcodeInSequence and I believe it would not be too much trouble to allow for other types, i.e., paired end and multiple samples inside a folder. However, I am currently running into an error is not complete type expression with the TypeMap. I will try to debug more another day to solve this error.

Best regards, Fabio

VinzentRisch commented 3 weeks ago

Hi @nbokulich I do not understand how this would work. How I understood the vsearch functionality is that if fasta files are given as input, fasta files are returned as output. And it is the same for fastq files. It is not possible in qiime to have different directory formats for the same output. So the only way I see is to have one output of format DNAFASTAFormat and one of CasavaOneEightSingleLanePerSampleDirFmt and depending on the input one of the outputs would be empty. Or am I missing something here?