bokulich-lab / RESCRIPt

REference Sequence annotation and CuRatIon Pipeline
BSD 3-Clause "New" or "Revised" License
84 stars 26 forks source link

orient-seqs: accept FASTQ data as input #159

Open nbokulich opened 11 months ago

nbokulich commented 11 months ago

vsearch --orient can accept FASTQ as input (and also output via the --fastqout option). Ideally, orient-seqs (which is just thinly wrapping vsearch --orient) could do the same.

This would require modifying the inputs/outputs here: https://github.com/bokulich-lab/RESCRIPt/blob/master/rescript/orient.py#L55

HOWEVER, the main issue I see is that the current inputs and outputs are DNAFASTAFormat objects. A FASTQ-formatted input (e.g., coming from some of the SampleData[.*Sequence.*] types) could not have DNAFASTAFormat as a view type. I suppose we need something like a Union[SingleLanePerSamplePairedEndFastqDirFmt | DNAFASTAFormat | ... ] as input and output, and a TypeMap in the plugin registration to accept and output the corresponding types.

colinvwood commented 11 months ago

Links to motivating forum posts here and here for reference.

mirand863 commented 11 months ago

Hi,

Thank you very much for opening this issue! I made some progress today. It is working with a single end FASTQ file using the type MultiplexedSingleEndBarcodeInSequence and I believe it would not be too much trouble to allow for other types, i.e., paired end and multiple samples inside a folder. However, I am currently running into an error is not complete type expression with the TypeMap. I will try to debug more another day to solve this error.

Best regards, Fabio