blajoie / hdf2tab

convert from HDF5 matrix format to txt/tab (tsv) matrix format
Apache License 2.0
10 stars 2 forks source link

cannot find attribute 'genome' #1

Open rckeerthivasan opened 7 years ago

rckeerthivasan commented 7 years ago

I am trying to convert hdf5 file into tsv file. I am giving the basic command that is suggested in the usage but I am getting this error. Can you figure out why? I am trying to convert hdf5 files from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE77565

Traceback (most recent call last): File "scripts/hdf2tab.py", line 706, in main() File "scripts/hdf2tab.py", line 95, in main genome=inhdf.attrs['genome'][:] File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (-------src-dir-------/h5py/_objects.c:2582) File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (-------src-dir-------/h5py/_objects.c:2541) File "/home/ken/miniconda2/lib/python2.7/site-packages/h5py/_hl/attrs.py", line 58, in getitem attr = h5a.open(self._id, self._e(name)) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (-------src-dir-------/h5py/_objects.c:2582) File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (-------src-dir-------/h5py/_objects.c:2541) File "h5py/h5a.pyx", line 77, in h5py.h5a.open (-------src-dir-------/h5py/h5a.c:2086) KeyError: "Can't open attribute (Can't locate attribute: 'genome')"

blajoie commented 7 years ago

Hi - sorry about the delay (I have been traveling). I will look into this later today and push a fix if necessary.

blajoie commented 7 years ago

Hi,

The paper you linked looks to be from an different group, that may be using an entirely different HDF5 file structure to capture Hi-C matrices. I doubt this tool will work as-is for the linked data. I would suggest you contact the authors re. accessing the HDF5 data.

blajoie commented 7 years ago

Ah! Will take a look then. Sounds like I simply need to make the genome attreibute an optional field to support older versions of HiC-HDF5 files.

More soon

On Fri, Aug 4, 2017 at 1:17 AM, wyt14 notifications@github.com wrote:

And the dataset is from your Lab

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blajoie/hdf2tab/issues/1#issuecomment-320187774, or mute the thread https://github.com/notifications/unsubscribe-auth/AIekTGEn-9-UDQlPds_mu58AGrJrSfMOks5sUtO2gaJpZM4N1zRw .

blajoie commented 7 years ago

Hi,

The script seems to work AOK for the ENCODE files. e.g. https://www.encodeproject.org/experiments/ENCSR079VIJ/

$python ~/git/hdf2tab/scripts/hdf2tab.py --input ENCFF298ZFN.h5 -v

hdf_blocksize 128 blocksize 128

building bin mask ... x bin_mask 6207 y bin_mask 6207

writing tsv matrix 6207x6207

Can you try again using the latest in git and the above file? Perhaps you were using an older version?

On Fri, Aug 4, 2017 at 8:25 AM, Bryan Lajoie bryan.lajoie@gmail.com wrote:

Ah! Will take a look then. Sounds like I simply need to make the genome attreibute an optional field to support older versions of HiC-HDF5 files.

More soon

On Fri, Aug 4, 2017 at 1:17 AM, wyt14 notifications@github.com wrote:

And the dataset is from your Lab

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blajoie/hdf2tab/issues/1#issuecomment-320187774, or mute the thread https://github.com/notifications/unsubscribe-auth/AIekTGEn-9-UDQlPds_mu58AGrJrSfMOks5sUtO2gaJpZM4N1zRw .

wyt14 commented 7 years ago

Thank you very much! I have got the right results based on your guides.

On 8/8/2017 01:22,Bryan Lajoienotifications@github.com wrote: Hi,

The script seems to work AOK for the ENCODE files. e.g. https://www.encodeproject.org/experiments/ENCSR079VIJ/

$python ~/git/hdf2tab/scripts/hdf2tab.py --input ENCFF298ZFN.h5 -v

hdf_blocksize 128 blocksize 128

building bin mask ... x bin_mask 6207 y bin_mask 6207

writing tsv matrix 6207x6207

Can you try again using the latest in git and the above file? Perhaps you were using an older version?

On Fri, Aug 4, 2017 at 8:25 AM, Bryan Lajoie bryan.lajoie@gmail.com wrote:

Ah! Will take a look then. Sounds like I simply need to make the genome attreibute an optional field to support older versions of HiC-HDF5 files.

More soon

On Fri, Aug 4, 2017 at 1:17 AM, wyt14 notifications@github.com wrote:

And the dataset is from your Lab

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blajoie/hdf2tab/issues/1#issuecomment-320187774, or mute the thread https://github.com/notifications/unsubscribe-auth/AIekTGEn-9-UDQlPds_mu58AGrJrSfMOks5sUtO2gaJpZM4N1zRw .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

wyt14 commented 7 years ago

Hi, I am so sorry to disturb you. I have a question about the processed data in your Lab. The following two files is about the nested-tads and tads regions.

ENCODE3-NCIH460-HindIIIhg19genomeC-40000-iced.nested-tads.bed ENCODE3-NCIH460-HindIIIhg19genomeC-40000-iced.tads.bed

But, I find some regions in .nested-tads.bed not be merged to the .tads.bed, for example the marked by yellow label in the following figure. So I want to know whether you filtered the the nested tads using some conditions and then merged to the tads file. If you did, which conditions you used. Thank you very much!

And look forward to your reply!

--

Wang Yuting School of Life Sciences Tsinghua University Beijing P.R.China