calico / basenji

Sequential regulatory activity predictions with deep convolutional neural networks.
Apache License 2.0
411 stars 126 forks source link

Black list #174

Closed yardenmatok203 closed 1 year ago

yardenmatok203 commented 1 year ago

Hey,

can you share your blacklist which indicates which areas in the genome are problematic for prediction?

Thanks, Yarden.

davek44 commented 1 year ago

Sure... Blacklist: https://storage.googleapis.com/basenji_barnyard2/hg38.blacklist.rep.bed Unmappable: https://storage.googleapis.com/basenji_barnyard2/umap_k36_t10_l32_hg38.bed

yardenmatok203 commented 1 year ago

what is "Unmappable"?

thanks, Yarden

davek44 commented 1 year ago

Unmappable specifies regions that are particularly difficult to map short reads to due to repeats.

yardenmatok203 commented 1 year ago

How should I use this mapping regions for prediction with your weights? For hff for example? How did you use those areas in training?

Thank you, Yarden

בתאריך יום ב׳, 14 באוג׳ 2023, 23:18, מאת David Kelley ‏< @.***>:

Unmappable specifies regions that are particularly difficult to map short reads to due to repeats.

— Reply to this email directly, view it on GitHub https://github.com/calico/basenji/issues/174#issuecomment-1678000362, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJYZ7Z4NPFVBJRIQHNPCE3LXVKB2XANCNFSM6AAAAAA3OO3ZSQ . You are receiving this because you authored the thread.Message ID: @.***>

yardenmatok203 commented 1 year ago

Hey David,

one more question, does: data/hg38_gaps_binsize2048_numconseq10.bed , handle this blacklist?

Thanks, Yarden

davek44 commented 1 year ago

Hi Yarden, I'm not sure I understand your questions. I don't recognize the filename you sent. You can choose to handle regions of difficult mappability however you'd like; there's no right or wrong way. I typically have the program clip the values in unmappable regions so that they aren't allowed to be very large.