glogsdon1 / sunk-based_assembly

14 stars 2 forks source link

run_sharedSUNKs.sh #5

Closed zhoudreames closed 3 years ago

zhoudreames commented 3 years ago

in shell script,there were 3 input file,what's the means of those file,and how to get those file ? thanks~ 3 file :sharedSUNKsDir;readsPosition;kmers image

zhoudreames commented 3 years ago

@glogsdon1

glogsdon1 commented 3 years ago

I have reorganized this repository to include the step to generate these files as well as the files themselves as example output, which is located here: scripts/1_mapping_sunks

In order to generate the files, you run Snakemake.py. The required input are: ont_reads.fasta (this is the fasta of the ONT reads you wish to stitch together) kmers.fofn (this is an fofn that lists the paths to files containing the SUNKs) SUNKsharing.py (this is a script that outputs shared_SUNKs.tbl) KmersPos.py (this is a script that outputs the reads.positions and reads.fa files)

I have added the ont_reads.fasta and kmers.fofn files as example inputs, but you should input these yourself for the set of reads you want to stitch together and the SUNKs you've generated for your genome. I should also note that these scripts are designed to run on our cluster and will need to be modified to run on yours.

This process generates a set of output files, which are the ones you were asking for.

sharedSUNKsDir is a table (shared_SUNKs.tbl) that lists each pair of reads, the number of SUNKs shared between the pair of reads, and a list of SUNKs shared between the pair of reads. readsPosition is a table (reads.positions) that lists the readID, SUNK, and its position in the read. kmers is a file (chr8_asat_ONT.kmers) that contains all of the SUNKs found in the reads.

Another file outputted is just the fasta of ONT reads that shared SUNKs, but this is usually identical to the input ont_reads.fasta.

zhoudreames commented 3 years ago

thank you ~