Closed manuelfmerino closed 2 years ago
Hi Manuel,
The main difference between gem2 and gem3 is that they allow different mismatches in the mapping of the reads and therefore we cannot expect the same number of mapped reads. But that does not explain big differences in the numbers.
In my view your main problem comes from step 1 to step 2, the mapping of the reads which is suspiciously low. Which publication are you using to test?
Regards
David
Hi David,
Thanks a lot for your answer. I agree that the problem likely comes from these steps. The publication I'm following is currently under review (by some collaborators of ours). I'm having trouble reaching who was in charge of the capture hi-c data processing, and figured it would be faster to try and reprocess the data myself. While the article is still not public, the dataset is, and can be found here: https://www.ebi.ac.uk/ena/browser/view/PRJEB42293
Cheers, Manuel
Hello,
I am trying to process the data from a capture Hi-C experiment of human chromosome 12. I am following the steps from the tutorial using data from a publication, trying to reach to similar results to those they have. I'm using fragment-based mapping. My problem is that after keeping the uniquely interacting read pairs, the number of reads decreases enormously, and my interaction matrices become somewhat sparse. Here are some examples:
As you can see, the number of uniquely mapped read pairs is less than 10% of the original number of reads. However, in the publication I'm taking as a reference, the number of uniquely mapped pairs is 52,386,237 at this stage of the pipeline, which I believe is a more reasonable number. Am I right and my numbers are just too low?
Some information on my procedure:
Fatal error (gem-indexer_fasta2meta+cont.c:368,main) Malformed FASTA/FASTQ file (sequence #1)
I tried to solve it but didn't find much on the Internet and gave up. Installed GEM3 instead using conda (I performed the whole TADbit installation on conda), and it worked without a problem. I assumed it was okay, but since now I'm facing weird results, I'm starting to wonder whether the Chr12 file I used as a reference genome is correct, or if a different version of GEM might be causing this.
Just in case, I'm running TADbit 1.0.1 (the latest available version on conda). I also redownloaded the files and reference genome and reran everything. obtaining identical results.
Any help would be more than welcome, I've been struggling with this for a few days now.
Thanks a lot, Manuel F. Merino