ernstlab / full_stack_ChromHMM_annotations

Data of genome annotation from full-stack ChromHMM model trained with 1032 datasets from 127 reference epigenomes
32 stars 1 forks source link

Overlapping annotations in hg38_genome_100_segments.bed #3

Closed balwierz closed 5 months ago

balwierz commented 8 months ago

Hi, I found that there are cases where regions overlap. For instance the first line below overlaps with the 3 subsequent regions.

chr1    146261547       148712700       40_EnhWk6
chr1    146868566       146872566       1_GapArtf1
chr1    146872566       146872966       7_Quies4
chr1    146872966       146873766       1_GapArtf1
havu73 commented 8 months ago

image

image

That is a fundamental problem with the tool liftOver itself that we do not know yet how to fix.

(2) After lifting 200-bp segments from hg19 to hg38, we would like to get rid of regions in hg38 that are mapped from multiple regions in hg19. We have coded this step with the implicit assumption that the length of 200-bp segments from hg19 mapped to hg38 are 200-bp. Therefore, it missed this case where the the end of one segment (148712700) is greater than the end of the overlapping segments (146872566, 146872966, 146873766).

Our solution at the moment: We reran the liftOver to resolve all these issues by simply removing all segments in hg38 that are mapped from 200-bp segments in hg19 but are not strictly 200-bp in hg38. It means that we strictly limit the liftOver to segments in hg19 have a unique 200-bp mapped hg38 segment. The data to download these files can be found at https://public.hoffman2.idre.ucla.edu/ernst/2K9RS//full_stack/full_stack_annotation_public_release/hg38. These are the files you can use:

havu73 commented 8 months ago

I typed my answer into a word document, and not sure why it does not allow pasting as text. So here is the link where you can download the hg38 chromatin state maps (please see above answer for the files that are of interests to you): https://public.hoffman2.idre.ucla.edu/ernst/2K9RS//full_stack/full_stack_annotation_public_release/hg38

havu73 commented 5 months ago

Hello! We have updated our readme with new links to the data, such that the problem of overlapping annotations are no longer present. Is it okay if we close this issue? We will keep it open in 3 days and then will close it!