Format of sample_wigs_file in basenji_hdf5_single.py?

calico / basenji

Sequential regulatory activity predictions with deep convolutional neural networks.

Apache License 2.0

409 stars 125 forks source link

Format of sample_wigs_file in basenji_hdf5_single.py? #13

Closed jakeyeung closed 6 years ago

jakeyeung commented 6 years ago

Hello Dave,

I was wondering what the input format is for sample_wig_files in the inputs for basenji_hdf5_single.py?

When I looked at the tutorial, it suggested a format with two columns (from data/heart_wigs.txt):

aorta        data/CNhs11760.bw
artery        data/CNhs12843.bw
pulmonic_valve        data/CNhs12856.bw

Unfortunately, when I run basenji_hdf5_single.py it gives me an index error at line 186:

  for line in open(sample_wigs_file, encoding='UTF-8'):
    a = line.rstrip().split('\t')
    target_wigs[a[0]] = a[1]
    target_strands.append(a[2])

It looks like it was trying to access a[2], which would be the third column of data/heart_wigs.txt, which does not exist in heart_wigs.txt

What is the expected input format for sample_wig_files?

Best,

Jake

davek44 commented 6 years ago

Ah sorry. I'm working to add support for stranded sequencing datasets. I pushed a fix to master.

WebbYang commented 5 years ago

Hello Dave,

May I ask the usage of identifier column? Because I'm trying to use another data downloaded from ENCODE to run the basenji_hdf5_single.py, however failed in:

[urlOpen] Couldn't open identifier for reading [urlOpen] Couldn't open identifier for reading [pyBwOpen] bw is NULL!

What should be the content inside this identifier? Thank you!

Best regards, Webb

davek44 commented 5 years ago

Hi Webb, could you provide some more details about the command that you ran? I now suggest using TFRecords rather than HDF5 for input data using my basenji_data.py script. You can see an example of that here: https://github.com/calico/basenji/blob/master/tutorials/preprocess.ipynb

WebbYang commented 5 years ago

Hi Dave, OK. I see! So there's no need to use the file "heart_l131k.h5" in the train_test.ipynb tutorial now right? Thank you very much! Best regards, Webb

davek44 commented 5 years ago

Yes, that's right. I see now that I accidentally left that in the tutorial. I just uploaded a new version that excludes that file.