Closed adazhang1 closed 3 years ago
Hi Ada, perhaps I misunderstood what I had downloaded from ENCODE. I thought I was looking at log fold change; I remember seeing negative values when I initially browsed around. If the ENCODE files weren't logged, I didn't add a log. So you could recreate by simply clipping the high end. I did also scale by 2 because it gave me slightly better results, but it didn't matter that much.
Got it - thank you so much for your reply! (Also, I hope you have a great Thanksgiving holiday!)
Hi @davek44
Thanks for making your code and data public!
I am trying to compare my model / data to basenji model / data, and have been digging through the .tfr files at https://console.cloud.google.com/storage/browser/basenji_barnyard/data.
The 2020 "Cross-species..." paper says that log fold change signal tracks were downloaded from ENCODE, high values were soft-clipped to 32, and negative values were clipped to zero. From my understanding, ENCODE only provides "fold change" not "log fold change" tracks - I therefore assumed that the ENCODE fold change tracks were soft clipped, then pushed through a log function, then negative values were clipped to zero.
The above mentioned .tfr files look like soft-clipped, fold change tracks, scaled by 2. Looking through the basenji code, I haven't (yet) found any further data processing - i.e., taking a log and clipping negatives - after data import.
Could you help me understand - did you import the data and then take a log and clip negative values later in the pipeline? ...Or did you train on these soft-clipped fold change tracks? Or perhaps I misunderstood something else?
Thanks so much for your help, I really appreciate your time!
Ada