PacificBiosciences / pb-falcon-phase

FALCON-Phase integrates PacBio long-read assemblies with Phase Genomics Hi-C data to create phased, diploid, chromosome-scale scaffolds
Other
5 stars 3 forks source link

Falcon-Phase second round for scaffolding #5

Open Juke34 opened 4 years ago

Juke34 commented 4 years ago

I have search for hours information how to perform the scaffolding and the only concrete pieces of information I found is here: https://github.com/phasegenomics/FALCON-Phase/issues/58

It's good start but I don't understand everything in detail. Could it be elaborated what is expected from step 6?

#1) The starting point is the set of FALCON-Phase sequences (phase_0 & phase_1).
#2) Concatenate the phase_0 and phase_1.
cp phase_0.fasta phase_all.fasta
cat phase_1.fasta >> phase_all.fasta
#3) Align the hi-c data to the concatenated fasta (command in the snakemake).
bwa index phased.fasta
bwa mem -5 -t 36 phase_all.fasta sample_R1_001.fastq.gz sample_R2_001.fastq.gz | samtools view -S -h -b -F 2316 > sample.unfiltered.bam
#4) Filter the hi-c data (command in the snakemake).
falcon-phase bamfilt -f 20 -m 10 -i sample.unfiltered.bam -o sample.filtered.bam 
#5) Convert bam to binary matrix (command in the snakemake).
falcon-phase bam2 binmat sample.filtered.bam sample.filtered.binmat
#6) Scaffold either phase_0 or phase_1 sequences.
??
#7) Using the scaffolding results you make an overlap index file (described in the paper). Since each pseuodhaplotype is paired (i.e. phase_0_0 and phase_1_0) you know they are in the same location on a scaffold.
??
#8) Run the falcon-phase binary with the required inputs.
??
#9) Swap out sequences in the scaffolds with the appropriate phase.
??