Nucleomics-VIB / InSilico_PCR

extract long read subsequences from a pair of primers
5 stars 1 forks source link
amplicon bbmap-msa insilico-pcr

(Nucleomics-VIB)

InSilico PCR

This Analysis aims at extracting genomic sequences mimicking 16S amplicon reads from a larger context long read set (gDNA reads or larger amplicons).

The motivation behind this analysis was to simulate 16S long amplicon sequencing on ONT platform using data obtained from the https://github.com/LomanLab/mockcommunity site as a large Promethion fastq archive but any other ONT or Pacbio fastq data can be used with this code (please acknowledge them when you would use this data too).

Data Availability

(reproduced from https://github.com/LomanLab/mockcommunity/edit/master/README.md)
Name Reads (M) Yield (G) FASTQ Run Folder Restarts FAST5
Zymo-PromethION-LOG-BB-SN 35.1 148 fastq.gz 64h run restarts download.sh, restarts.tar
Zymo-PromethION-EVEN-BB-SN 36.5 146 fastq.gz 64h run restarts download.sh, restarts.tar
Zymo-GridION-LOG-BB-SN 3.7 16 fastq.gz 48h run n/a signal.tar
Zymo-GridION-EVEN-BB-SN 3.5 14 fastq.gz 48h run n/a signal.tar

Method

The method used to extract sequences between primers was developed by Brian Bushnell based on his BBMap tools and is explained here.

The workflow is as follows:

Usage

REM: Snakemake does not always install well with bioconda and can be removed from the file environment.yaml if you encounter issues. It is not used in the current version of this code anyway and was added here for forward compatibility.

Results obtained with the demo data

Future plans

This code should and will be changed to a snakemake pipeline in order to be more portable. The config.yaml file is the first step towards this transition.


Please send comments and feedback to nucleomics.bioinformatics@vib.be


Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.