ODM2 / WOFpy

A server-side implementation of CUAHSI's Water One Flow service stack in Python.
http://odm2.github.io/WOFpy/
9 stars 9 forks source link

occasional error: SSL SYSCALL error: EOF detected #150

Open miguelcleon opened 7 years ago

miguelcleon commented 7 years ago

Seemingly at random, I'll get the below error. Then WOFpy stops working and I need to reload apache to get it working again. I've been trying to figure out a reproducible way to get this error but I haven't found it yet. If I do I'll update this issue. I was also going to post the 2nd error you get after this one but again because I can't reproduce it, now I'm not getting it. I'll add the second error when it happens again.


<ns0:Fault xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope/">
<faultcode>soap11env:Server</faultcode>
<faultstring>
(psycopg2.OperationalError) SSL SYSCALL error: EOF detected [SQL: 'SELECT DISTINCT odm2.sites.samplingfeatureid AS odm2_sites_samplingfeatureid, odm2.samplingfeatures.samplingfeatureid AS odm2_samplingfeatures_samplingfeatureid, odm2.sites.spatialreferenceid AS odm2_sites_spatialreferenceid, odm2.sites.sitetypecv AS odm2_sites_sitetypecv, odm2.sites.latitude AS odm2_sites_latitude, odm2.sites.longitude AS odm2_sites_longitude, odm2.samplingfeatures.samplingfeatureuuid AS odm2_samplingfeatures_samplingfeatureuuid, odm2.samplingfeatures.samplingfeaturetypecv AS odm2_samplingfeatures_samplingfeaturetypecv, odm2.samplingfeatures.samplingfeaturecode AS odm2_samplingfeatures_samplingfeaturecode, odm2.samplingfeatures.samplingfeaturename AS odm2_samplingfeatures_samplingfeaturename, odm2.samplingfeatures.samplingfeaturedescription AS odm2_samplingfeatures_samplingfeaturedescription, odm2.samplingfeatures.samplingfeaturegeotypecv AS odm2_samplingfeatures_samplingfeaturegeotypecv, odm2.samplingfeatures.elevation_m AS odm2_samplingfeatures_elevation_m, odm2.samplingfeatures.elevationdatumcv AS odm2_samplingfeatures_elevationdatumcv, odm2.samplingfeatures.featuregeometrywkt AS odm2_samplingfeatures_featuregeometrywkt, CASE WHEN (odm2.samplingfeatures.samplingfeaturetypecv = %(samplingfeaturetypecv_1)s) THEN %(param_1)s WHEN (odm2.samplingfeatures.samplingfeaturetypecv = %(samplingfeaturetypecv_2)s) THEN %(param_2)s ELSE %(param_3)s END AS _sa_polymorphic_on \nFROM odm2.samplingfeatures JOIN odm2.sites ON odm2.samplingfeatures.samplingfeatureid = odm2.sites.samplingfeatureid JOIN odm2.featureactions ON odm2.samplingfeatures.samplingfeatureid = odm2.featureactions.samplingfeatureid JOIN (odm2.results JOIN odm2.timeseriesresults ON odm2.results.resultid = odm2.timeseriesresults.resultid) ON odm2.featureactions.featureactionid = odm2.results.featureactionid \nWHERE odm2.featureactions.samplingfeatureid = odm2.sites.samplingfeatureid AND odm2.results.featureactionid = odm2.featureactions.featureactionid AND odm2.sites.latitude >= %(latitude_1)s AND odm2.sites.latitude <= %(latitude_2)s AND odm2.sites.longitude >= %(longitude_1)s AND odm2.sites.longitude <= %(longitude_2)s'] [parameters: {'longitude_1': -114.0, 'longitude_2': -110.0, 'param_1': 'Specimen', 'param_3': 'samplingfeatures', 'param_2': 'Site', 'latitude_2': 42.0, 'latitude_1': 40.0, 'samplingfeaturetypecv_2': 'Site', 'samplingfeaturetypecv_1': 'Specimen'}]
</faultstring>
<faultactor/>
</ns0:Fault>
miguelcleon commented 7 years ago

I'm thinking this may well be a the system running out of ram to store the data in local memory while it's pulling sql records into Django querysets. I've run into that problem while doing data ingestion and had to write some server side scripts to break files into smaller pieces. I don't get this same error but I suspect it is just manifesting differently in this setting.

lsetiawan commented 7 years ago

@miguelcleon It seems like you have encountered this problem before and solved it? https://github.com/ODM2/WOFpy/issues/73#issuecomment-309090364

miguelcleon commented 7 years ago

@lsetiawan So it's an issue that is appearing intermittently, I initially thought it was the file system running out of space but that doesn't appear to be the case. When you reload apache the error goes away and might not appear again for a bit.

lsetiawan commented 7 years ago

Hmm... okay.

system running out of ram to store the data in local memory while it's pulling sql records into Django querysets.

how does Django come into play with WOFpy?

emiliom commented 7 years ago

The "lazy-apps" setting @lsetiawan has mentioned before, which fixed this issue with a postgresql backend, is probably specific to ngingx, right Don? Assuming it is, maybe there's an equivalent setting/flag in Apache?

lsetiawan commented 7 years ago

lazy-apps is specific to uWSGI settings. Seems like Apache is usually paired with mod_wsgi, at least from Flask documentation (http://flask.pocoo.org/docs/0.12/deploying/mod_wsgi/)

miguelcleon commented 7 years ago

with the DAO, I would think, loading querysets with lots of SQL records. I'm kind of guessing with the RAM thing, it would need more testing to figure out if that is really happening. Actually all I'd need to do is pull a huge time series and watch top.

miguelcleon commented 7 years ago

after you get the EOF error:

From here http://dev-odm2admin.cuahsi.org/wofpy/odm2timeseries/rest/1_1/GetVariableInfo?variable=odm2timeseries:DO%20Concentration you get:

<ns0:Fault xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope/">
<faultcode>soap11env:Server</faultcode>
<faultstring>
(sqlalchemy.exc.InvalidRequestError) Can't reconnect until invalid transaction is rolled back [SQL: u'SELECT DISTINCT ON (odm2.variables.variableid) odm2.timeseriesresultvalues.valueid AS odm2_timeseriesresultvalues_valueid, odm2.timeseriesresultvalues.resultid AS odm2_timeseriesresultvalues_resultid, odm2.timeseriesresultvalues.datavalue AS odm2_timeseriesresultvalues_datavalue, odm2.timeseriesresultvalues.valuedatetime AS odm2_timeseriesresultvalues_valuedatetime, odm2.timeseriesresultvalues.valuedatetimeutcoffset AS odm2_timeseriesresultvalues_valuedatetimeutcoffset, odm2.timeseriesresultvalues.censorcodecv AS odm2_timeseriesresultvalues_censorcodecv, odm2.timeseriesresultvalues.qualitycodecv AS odm2_timeseriesresultvalues_qualitycodecv, odm2.timeseriesresultvalues.timeaggregationinterval AS odm2_timeseriesresultvalues_timeaggregationinterval, odm2.timeseriesresultvalues.timeaggregationintervalunitsid AS odm2_timeseriesresultvalues_timeaggregationintervalunitsi_1 \nFROM odm2.timeseriesresultvalues JOIN (odm2.results JOIN odm2.timeseriesresults ON odm2.results.resultid = odm2.timeseriesresults.resultid) ON odm2.timeseriesresults.resultid = odm2.timeseriesresultvalues.resultid JOIN odm2.variables ON odm2.variables.variableid = odm2.results.variableid \nWHERE odm2.variables.variableid = odm2.results.variableid AND odm2.variables.variablecode = %(variablecode_1)s'] [parameters: [{}]]
</faultstring>
<faultactor/>
</ns0:Fault>

from here http://dev-odm2admin.cuahsi.org/wofpy/odm2timeseries/rest/1_1/GetSites?site=odm2timeseries:Rio%20Icacos%20Trib-IO you get:

<ns0:Fault xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope/">
<faultcode>soap11env:Server</faultcode>
<faultstring>Site odm2timeseries:Rio Icacos Trib-IO Not Found</faultstring>
<faultactor/>
</ns0:Fault>

from here http://dev-odm2admin.cuahsi.org/wofpy/odm2timeseries/rest/1_1/GetValues?location=odm2timeseries:Rio%20Icacos%20Trib-IO&variable=odm2timeseries:DO%20Concentration you get:


<ns0:Fault xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope/">
<faultcode>soap11env:Server</faultcode>
<faultstring>
Values Not Found for Rio Icacos Trib-IO:DO Concentration for dates None - None
</faultstring>
<faultactor/>
</ns0:Fault>
lsetiawan commented 7 years ago

@miguelcleon if you restart WOFpy, what happens then?

miguelcleon commented 7 years ago

It will work again.

lsetiawan commented 7 years ago

I am currently using your ODM2LCZO Database for testing. I am not seeing the problem there so far.

miguelcleon commented 7 years ago

prior to the EOF error I got a timeout error below. The problem doesn't seem to be reproducible unfortunately.


Fault: Fault(Server: "(psycopg2.DatabaseError) SSL SYSCALL error: Connection timed out\\n [SQL: 'SELECT DISTINCT odm2.sites.samplingfeatureid AS odm2_sites_samplingfeatureid, odm2.samplingfeatures.samplingfeatureid AS odm2_samplingfeatures_samplingfeatureid, odm2.sites.spatialreferenceid AS odm2_sites_spatialreferenceid, odm2.sites.sitetypecv AS odm2_sites_sitetypecv, odm2.sites.latitude AS odm2_sites_latitude, odm2.sites.longitude AS odm2_sites_longitude, odm2.samplingfeatures.samplingfeatureuuid AS odm2_samplingfeatures_samplingfeatureuuid, odm2.samplingfeatures.samplingfeaturetypecv AS odm2_samplingfeatures_samplingfeaturetypecv, odm2.samplingfeatures.samplingfeaturecode AS odm2_samplingfeatures_samplingfeaturecode, odm2.samplingfeatures.samplingfeaturename AS odm2_samplingfeatures_samplingfeaturename, odm2.samplingfeatures.samplingfeaturedescription AS odm2_samplingfeatures_samplingfeaturedescription, odm2.samplingfeatures.samplingfeaturegeotypecv AS odm2_samplingfeatures_samplingfeaturegeotypecv, odm2.samplingfeatures.elevation_m AS odm2_samplingfeatures_elevation_m, odm2.samplingfeatures.elevationdatumcv AS odm2_samplingfeatures_elevationdatumcv, odm2.samplingfeatures.featuregeometrywkt AS odm2_samplingfeatures_featuregeometrywkt, CASE WHEN (odm2.samplingfeatures.samplingfeaturetypecv = %(samplingfeaturetypecv_1)s) THEN %(param_1)s WHEN (odm2.samplingfeatures.samplingfeaturetypecv = %(samplingfeaturetypecv_2)s) THEN %(param_2)s ELSE %(param_3)s END AS _sa_polymorphic_on \\\\nFROM odm2.samplingfeatures JOIN odm2.sites ON odm2.samplingfeatures.samplingfeatureid = odm2.sites.samplingfeatureid JOIN odm2.featureactions ON odm2.samplingfeatures.samplingfeatureid = odm2.featureactions.samplingfeatureid JOIN (odm2.results JOIN odm2.timeseriesresults ON odm2.results.resultid = odm2.timeseriesresults.resultid) ON odm2.featureactions.featureactionid = odm2.results.featureactionid \\\\nWHERE odm2.featureactions.samplingfeatureid = odm2.sites.samplingfeatureid AND odm2.results.featureactionid = odm2.featureactions.featureactionid'] [parameters: {'param_1': 'Specimen', 'param_2': 'Site', 'samplingfeaturetypecv_2': 'Site', 'param_3': 'samplingfeatures', 'samplingfeaturetypecv_1': 'Specimen'}]")
miguelcleon commented 7 years ago

I pulled the timeout error from the apache error log.

lsetiawan commented 7 years ago

Are you using mod_wsgi or uWSGI?

miguelcleon commented 7 years ago

mod_wsgi

lsetiawan commented 7 years ago

I think you don't have a "graceful" reloading in place. So everytime you reload the browser, it's not killing the session, so eventually your database gets overwhelmed. lazy-apps fixes that with uWSGI/NGINX setup. I am not sure what the equivalent is for mod_wsgi/Apache setup, unless you already figured it out and it's still not working.

miguelcleon commented 7 years ago

ok, I'll look into that.

lsetiawan commented 7 years ago

@miguelcleon I have hopefully provided some fix to your EOF problem.. Please try to deploy your WOFpy Server again with the latest copy of master and let me know if you encountered the error still. Thanks.