WilsonSayresLab / XYalign

Identifying, understanding, and correcting technical biases on the sex chromosomes in next-generation sequencing data
Other
23 stars 5 forks source link

Strip & realign high coverage 1000genomes samples #11

Closed mathbionerd closed 7 years ago

mathbionerd commented 7 years ago

Need to strip BAMs -> fastQ -> sort -> realign to: A) hg19, B) hg38, C) GRCH38

Phillip-a-richmond commented 7 years ago

What is the difference between hg38 and GRCH38?

thw17 commented 7 years ago

hg38 and Grch38 should only differ in chromosome naming and annotation (I believe), otherwise I think the sequences are supposed to be identical. However, it will be informative to compare hg38/Grch38 with the 1000 genomes equivalent.

mathbionerd commented 7 years ago

I think it will be important to compare them with the 1000genomes version, but I don't think it is necessary for the initial publication.

So, just focus on hg19 & hg38 (ignoring 1000genomes version of GRCh38) for now?

tanyaphung commented 7 years ago

Focusing on hg19 & hg38 for now sounds good.

thw17 commented 7 years ago

Yes, that sounds good.

Madelinehazel commented 7 years ago

Will this be done on CGC? And if so, has someone been assigned this task?

mathbionerd commented 7 years ago

@Madelinehazel - if you can do the stripping and realignment locally, that might be ideal, then can share the realigned BAM. But, that said, there are now 5 BAMs on CGC. Let me know how you'd like to proceed.

mathbionerd commented 7 years ago

@Madelinehazel - you already did this for two files - right? For one male and one female - which genome was it to? Can you also run it to the other (either hg19 or hg38)?

Then, I think we can proceed with these two example files. With the two different reference genome alignments.

Madelinehazel commented 7 years ago

@mwilsonsayres I can do that locally. I've done the stripping and realignment for HG00419 (female) on Grch38 but not for a male, as I didn't have a Y mask at the time. I can do the stripping and realignment for the male and female for Hg19 and Hg38 as well. My server is acting up but I am hoping this will be resolved today.

mathbionerd commented 7 years ago

Sounds great, @Madelinehazel - I think that for this, what we want is strait up alignments without accounting for the sex chromosome-specific biology - as most people would do.

So, running the alignment for HG00419 to the whole genome (including the Y), and running a genetic male sample to the reference genome that is downloaded automatically.

Then, we can use these two individuals, for both reference genomes, to run through XYalign.

mathbionerd commented 7 years ago

Notes for myself.

Information about 1000 genomes available sequences:

http://www.internationalgenome.org/data-portal/sample

There are eight samples that currently have:

HG00513 Female CHS HG00512 Male CHS HG00733 Female PUR HG00731 Male PUR NA19238 Female YRI NA19240 Female YRI NA19239 Male YRI HG00732 Female PUR

thw17 commented 7 years ago

I used fastqs to avoid issues with stripped reads. Closing this issue.