Closed mathbionerd closed 7 years ago
What is the difference between hg38 and GRCH38?
hg38 and Grch38 should only differ in chromosome naming and annotation (I believe), otherwise I think the sequences are supposed to be identical. However, it will be informative to compare hg38/Grch38 with the 1000 genomes equivalent.
I think it will be important to compare them with the 1000genomes version, but I don't think it is necessary for the initial publication.
So, just focus on hg19 & hg38 (ignoring 1000genomes version of GRCh38) for now?
Focusing on hg19 & hg38 for now sounds good.
Yes, that sounds good.
Will this be done on CGC? And if so, has someone been assigned this task?
@Madelinehazel - if you can do the stripping and realignment locally, that might be ideal, then can share the realigned BAM. But, that said, there are now 5 BAMs on CGC. Let me know how you'd like to proceed.
@Madelinehazel - you already did this for two files - right? For one male and one female - which genome was it to? Can you also run it to the other (either hg19 or hg38)?
Then, I think we can proceed with these two example files. With the two different reference genome alignments.
@mwilsonsayres I can do that locally. I've done the stripping and realignment for HG00419 (female) on Grch38 but not for a male, as I didn't have a Y mask at the time. I can do the stripping and realignment for the male and female for Hg19 and Hg38 as well. My server is acting up but I am hoping this will be resolved today.
Sounds great, @Madelinehazel - I think that for this, what we want is strait up alignments without accounting for the sex chromosome-specific biology - as most people would do.
So, running the alignment for HG00419 to the whole genome (including the Y), and running a genetic male sample to the reference genome that is downloaded automatically.
Then, we can use these two individuals, for both reference genomes, to run through XYalign.
Notes for myself.
Information about 1000 genomes available sequences:
http://www.internationalgenome.org/data-portal/sample
There are eight samples that currently have:
HG00513 Female CHS HG00512 Male CHS HG00733 Female PUR HG00731 Male PUR NA19238 Female YRI NA19240 Female YRI NA19239 Male YRI HG00732 Female PUR
I used fastqs to avoid issues with stripped reads. Closing this issue.
Need to strip BAMs -> fastQ -> sort -> realign to: A) hg19, B) hg38, C) GRCH38