HDFGroup / h5serv

Reference service implementation of the HDF5 REST API
Other
168 stars 35 forks source link

cannot concatenate 'str' and 'NoneType' objects on large sparse dataset #112

Closed vjcitn closed 7 years ago

vjcitn commented 7 years ago

a large hdf5 file is distributed by 10x genomics

https://community.10xgenomics.com/t5/10x-Blog/Our-1-3-million-single-cell-dataset-is-ready-to-download/ba-p/276

the basic structure can be seen with R:

h5ls("1M_neurons_filtered_gene_bc_matrices_h5.h5") group name otype dclass dim 0 / mm10 H5I_GROUP
1 /mm10 barcodes H5I_DATASET STRING 1306127 2 /mm10 data H5I_DATASET INTEGER 2624828308 3 /mm10 gene_names H5I_DATASET STRING 27998 4 /mm10 genes H5I_DATASET STRING 27998 5 /mm10 indices H5I_DATASET INTEGER 2624828308 6 /mm10 indptr H5I_DATASET INTEGER 1306128 7 /mm10 shape H5I_DATASET INTEGER 2

when this file is placed in h5serv data folder, it can be seen but throws 'cannot concatenate 'str' and 'NoneType' objects'

(h5serv) ubuntu@ip-172-30-0-173:/datam/h5serv/data$ INFO:app.py:284::getFilePath: 1M_neurons_filtered_gene_bc_matrices_h5.hdfgroup.org checkExists: True INFO:app.py:286::tocFilePath: ../data/.toc.h5 ('host:', u'1M_neurons_filtered_gene_bc_matrices_h5.hdfgroup.org', 'topdomain:', 'hdfgroup.org') INFO:app.py:321::verifyFile: ../data/1M_neurons_filtered_gene_bc_matrices_h5.h5 INFO:fileUtil.py:268::verifyFile('../data/1M_neurons_filtered_gene_bc_matrices_h5.h5', False) INFO:app.py:188::baseHandler, href: http://54.163.220.201:5000 INFO:app.py:194::REQ GET http://54.163.220.201:5000/?host=1M_neurons_filtered_gene_bc_matrices_h5.hdfgroup.org {remote_ip: 50.95.128.186} INFO:hdf5db.py:163::init -- filePath: ../data/1M_neurons_filtered_gene_bc_matrices_h5.h5 mode: r+ ERROR:tornado.application:Uncaught exception GET /?host=1M_neurons_filtered_gene_bc_matrices_h5.hdfgroup.org (50.95.128.186) HTTPServerRequest(protocol='http', host='54.163.220.201:5000', method='GET', uri='/?host=1M_neurons_filtered_gene_bc_matrices_h5.hdfgroup.org', version='HTTP/1.1', remote_ip='50.95.128.186', headers={'Host': '54.163.220.201:5000', 'Accept-Encoding': 'gzip, deflate', 'Accept': 'application/json, text/xml, application/xml, /', 'User-Agent': 'libcurl/7.53.1 r-curl/2.4 httr/1.2.1'}) Traceback (most recent call last): File "/datam/anaconda/envs/h5serv/lib/python2.7/site-packages/tornado/web.py", line 1467, in _execute result = method(*self.path_args, **self.path_kwargs) File "app.py", line 2949, in get response = self.getRootResponse(self.filePath) File "app.py", line 2912, in getRootResponse self.log.info("IOError: " + str(e.errno) + " " + e.strerror) TypeError: cannot concatenate 'str' and 'NoneType' objects ERROR:tornado.access:500 GET /?host=1M_neurons_filtered_gene_bc_matrices_h5.hdfgroup.org (50.95.128.186) 1.98ms

jreadey commented 7 years ago

Hi, Could you provide a http or s3 path for the download? It wasn't clear to me from the web page where to get the file you referenced.

vjcitn commented 7 years ago

https://s3-us-west-2.amazonaws.com/10x.files/samples/cell/1M_neurons/1M_neurons_filtered_gene_bc_matrices_h5.h5

On Sun, Apr 16, 2017 at 12:13 PM, John Readey notifications@github.com wrote:

Hi, Could you provide a http or s3 path for the download? It wasn't clear to me from the web page where to get the file you referenced.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/HDFGroup/h5serv/issues/112#issuecomment-294359896, or mute the thread https://github.com/notifications/unsubscribe-auth/AEaOwtAG-ftn6C-cLCKkVM8zB3BarNTAks5rwj4YgaJpZM4M-mzz .

jreadey commented 7 years ago

I'm not seeing the exception on my system. Do you have the latest updates for h5serv and hdf5-json?

What versions does http://:5000/info report?

Also, try this:

Do you get a crash immediately or when the file is accessed through a rest call?

vjcitn commented 7 years ago

On Sun, Apr 16, 2017 at 9:50 PM, John Readey notifications@github.com wrote:

I'm not seeing the exception on my system. Do you have the latest updates for h5serv and hdf5-json?

What versions does http://:5000/info report?

Welcome to h5serv!h5serv is a webservice for HDF5 dataDocumentation: h5serv documentation http://h5serv.readthedocs.org/server version: 0.2h5py version: 2.7.0hdf5 version: 1.8.17

these versions are more current than the demonstration version https://data.hdfgroup.org:7258/info

Also, try this:

  • stop the server
  • rm /data/.toc.h5
  • restart server

Do you get a crash immediately or when the file is accessed through a rest call?

the errors i reported involve restful calls. the server continues to operate for other data resources

by the way, the server i am running is publicly accessible EC2. so you can throw requests (e.g., /info) at it if that is at all helpful. at least that is our intention

http://54.163.220.201:5000/info

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/HDFGroup/h5serv/issues/112#issuecomment-294388708, or mute the thread https://github.com/notifications/unsubscribe-auth/AEaOwtFpPCMUvf5BCsd7UNLa2r9g6BcIks5rwsVwgaJpZM4M-mzz .

vjcitn commented 7 years ago

On Sun, Apr 16, 2017 at 10:17 PM, Vincent Carey stvjc@channing.harvard.edu wrote:

On Sun, Apr 16, 2017 at 9:50 PM, John Readey notifications@github.com wrote:

I'm not seeing the exception on my system. Do you have the latest updates for h5serv and hdf5-json?

What versions does http://:5000/info report?

Welcome to h5serv!h5serv is a webservice for HDF5 dataDocumentation: h5serv documentation http://h5serv.readthedocs.org/server version: 0.2h5py version: 2.7.0hdf5 version: 1.8.17

these versions are more current than the demonstration version https://data.hdfgroup.org:7258/info

Also, try this:

  • stop the server
  • rm /data/.toc.h5
  • restart server

this did not solve the problem. i also updated (git pull) and installed h5pyd to no avail.

Do you get a crash immediately or when the file is accessed through a rest call?

the errors i reported involve restful calls. the server continues to operate for other data resources

by the way, the server i am running is publicly accessible EC2. so you can throw requests (e.g., /info) at it if that is at all helpful. at least that is our intention

http://54.163.220.201:5000/info

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/HDFGroup/h5serv/issues/112#issuecomment-294388708, or mute the thread https://github.com/notifications/unsubscribe-auth/AEaOwtFpPCMUvf5BCsd7UNLa2r9g6BcIks5rwsVwgaJpZM4M-mzz .

jreadey commented 7 years ago

What are the rest request(s) that cause the crash?

vjcitn commented 7 years ago

it isn't a crash. the server keeps going.

i have several HDF5 files on our server. one is called assays.h5

the request

http://54.163.220.201:5000/?host=assays.hdfgroup.org

yields

{"lastModified": "2017-03-24T15:17:35Z", "hrefs": [{"href": "http://54.163.220.201:5000/?host=assays.hdfgroup.org", "rel": "self"}, {"href": "http://54.163.220.201:5000/datasets?host=assays.hdfgroup.org", "rel": "database"}, {"href": "http://54.163.220.201:5000/groups?host=assays.hdfgroup.org", "rel": "groupbase"}, {"href": "http://54.163.220.201:5000/datatypes?host=assays.hdfgroup.org", "rel": "typebase"}, {"href": "http://54.163.220.201:5000/groups/0286c478-10a5-11e7-8e4b-125df2002668?host=assays.hdfgroup.org", "rel": "root"}], "root": "0286c478-10a5-11e7-8e4b-125df2002668", "created": "2017-03-28T15:01:18Z"}

all is well

the request

http://54.163.220.201:5000/?host=1M_neurons_filtered_gene_bc_matrices_h5.hdfgroup.org

yields

Traceback (most recent call last): File "/datam/anaconda/envs/h5serv/lib/python2.7/site-packages/tornado/web.py", line 1467, in _execute result = method(*self.path_args, **self.path_kwargs) File "app.py", line 2949, in get response = self.getRootResponse(self.filePath) File "app.py", line 2912, in getRootResponse self.log.info("IOError: " + str(e.errno) + " " + e.strerror) TypeError: cannot concatenate 'str' and 'NoneType' objects

if i understand correctly, you have placed the file on your server,

and the request

?host=1M_neurons_filtered_gene_bc_matrices_h5.hdfgroup.org

yields something other than a Traceback. I would like to understand whether

the error arises from some aspect of my server system, our installation of the

server, or something else. Thank you

On Sun, Apr 16, 2017 at 11:23 PM, John Readey notifications@github.com wrote:

What are the rest request(s) that cause the crash?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/HDFGroup/h5serv/issues/112#issuecomment-294397787, or mute the thread https://github.com/notifications/unsubscribe-auth/AEaOwigODPssOwC5zA723vw3xARSQsjHks5rwttIgaJpZM4M-mzz .

vjcitn commented 7 years ago

seems to be a memory fault in disguise. on a larger machine this error does not occur.

jreadey commented 7 years ago

Ok, I'm glad you were able to resolve it. The error message was a bit misleading. The "cannot concatenate" error come up in the error handler of a prior exception.