BilkentCompGen / hercules

Profile HMM-based hybrid error correction algorithm for long reads
BSD 3-Clause "New" or "Revised" License
20 stars 4 forks source link

the size of CH17-157L1 data set #15

Closed Hulanyue closed 4 years ago

Hulanyue commented 4 years ago

Hi,I am sorry to bother you again!

the size of CH17-157L1 data set is 231 kb in paper ,the id is SRR1171743,but the data set I download in NCBI is about 1.1G and the bases are 261.6Mbp.

I want to know whether 231kb is the size of sample when experience.

if yes,does it mean that I can sample the data from data set at will,as long as the data size(231kb) is satisfied?

Thank you!

Hulanyue commented 4 years ago

there is a question about getting illumina paired-end reads

when I click http://eichlerlab.gs.washington.edu/pacbio-complex-regions and get Illumina reads about CH17-157L1,it show me You don't have permission to access /pacbio-complex-regions/illumina_reads/CH17-157L1/ on this server.

Can you help me ?

Thank you very very much!

calkan commented 4 years ago

SRA refers to the raw reads. What you are looking for is the CH17-157L1 assembly. For this you need to search for CH17-157L1 in the NCBI nucleotide database: https://www.ncbi.nlm.nih.gov/nuccore/

then you'll get: https://www.ncbi.nlm.nih.gov/nuccore/AC254825.1

Hulanyue commented 4 years ago

@calkan Thanks for your reply! So,the AC254825.1 is the error long read,it is okay? Can you tell me how I can get the illumina reads? Or is AC243627.3 illumina reads?(but the size of AC243627.3 is only 228k)

Hulanyue commented 4 years ago

I am sorry ! I read the paper again and AC243627.3 is the reference genome for corrected LR. What I am looking for is the error LRs and SRs that mapped to LRs

calkan commented 4 years ago

No AC254825.1 is the assembly based on Sanger sequencing. SRR1171743 is PacBio. Illumina was available at the Eichler Lab web site, but I have no admin access to it. We also just downloaded it when it was available. Try asking John Huddleston, the first author of the related paper.

Hulanyue commented 4 years ago

Okay,Thank you very very much! And is it https://github.com/huddlej in Github that I can ask John Huddleston?

calkan commented 4 years ago

that's him

Hulanyue commented 4 years ago

Okay,Thank you again!