Combining PoreC data - Githubissues

Hi Zhou,

I'm currently working with Yahs and ONT PoreC data to scaffold my plant genome (~800 Mb). I've tried two different protocols (3 libraries each with concatenated .bed files) for PoreC analysis resulting in:

protocol A) 19.2 M paired reads, 2.5 M intra, 1.67 M inter (Read N50 ~1 kb)

short_scaffolds

protocol B) 16.8 M paired reads, 5.9 M intra, 10.8 M inter (Read N50 ~ 3.3 kb)

Long_scaffolds

Then I've concatenate all .bed files and running yahs resulting in

All_Scaffolds

According to Juicebox results and assembly statistics, combining the different runs gives more noisy and missassembled results than using only one data set. I wonder if there could be a problem with long and short distance contacts or with the ratios between inter/intra/read pairs between the datasets? Or should the two protocols be run consecutively? Probably the first dataset (protocol A) has a negative impact?

Best Freya

c-zhou / yahs

Combining PoreC data #59