dhimmel / learn

Machine learning and feature extraction for the Rephetio project
https://doi.org/10.15363/thinklab.d210
4 stars 5 forks source link

Failing feature extraction queries due to py2neo's socket timeout #1

Closed dhimmel closed 8 years ago

dhimmel commented 8 years ago

py2neo-2.0.8 has a hardcoded default timeout of 30 seconds per Cypher query. The timeout can be adjusted by overriding the default value for py2neo.packages.httpstream.http.socket_timeout.

In 7788972700cad4041d81fe1cb870e885e1ec0a28 we did not increase the default timeout in all-features/3-extract.ipynb, which led to some queries silently failing and being omitted from all-features/data/dwpc.tsv.bz2. Previously, the notebook imported hetio.neo4j, which overrides the default timeout, so this issue didn't arise.

In total, we performed 27,315,900 queries. Below is the upper tail of the runtime distribution as measured in Python.

Query Timeout Distribution

Note the subset of queries that appeared to take twice as long as the socket_timeout default of 30 seconds. I'm not ready to call this a py2neo issue, since we're running many queries in parallel using threading, which complicates the diagnosis.