fhcrc / deenurp

16S rRNA gene sequence curation and phylogenetic reference set creation
GNU General Public License v3.0
4 stars 3 forks source link

orientate_sequences using vsearch #34

Closed crosenth closed 6 years ago

crosenth commented 9 years ago

Create a script that will reverse complement sequences that are in the wrong orientation. Outputs will be a fasta file with sequences that matched at a certain percent id as well as optional csv with alignment indexes on the target sequence(s). Optional notmatched output will be available as well.

crosenth commented 9 years ago

deenurp orientate_sequences --help usage: deenurp orientate_sequences [-h] [--threads NUM] [--id ID] [--out fasta] [--out_csv csv] [--out_notmatched fasta] fasta fasta

Fix orientation of sequences and output target sequence alignment indexes

positional arguments: fasta input sequences fasta target sequences

optional arguments: -h, --help show this help message and exit --threads NUM number of available threads [all] --id ID alignment identity percent

outputs: --out fasta [stdout] --out_csv csv output csv with columns query,target,tilo,tihi --out_notmatched fasta seqnames that did not match tseqs at id threshold

crosenth commented 9 years ago

Needs some test cases.

crosenth commented 9 years ago

Allow seq_info input (notmatched_seq_info.csv and matched_seq_info.csv)

crosenth commented 8 years ago

https://github.com/fhcrc/deenurp/blob/master/deenurp/subcommands/orientate_sequences.py

Still need some unittests.

crosenth commented 8 years ago

TODO: Need to filter out low coverage alignments