Open miguelcleon opened 7 years ago
I'm thinking this may well be a the system running out of ram to store the data in local memory while it's pulling sql records into Django querysets. I've run into that problem while doing data ingestion and had to write some server side scripts to break files into smaller pieces. I don't get this same error but I suspect it is just manifesting differently in this setting.
@miguelcleon It seems like you have encountered this problem before and solved it? https://github.com/ODM2/WOFpy/issues/73#issuecomment-309090364
@lsetiawan So it's an issue that is appearing intermittently, I initially thought it was the file system running out of space but that doesn't appear to be the case. When you reload apache the error goes away and might not appear again for a bit.
Hmm... okay.
system running out of ram to store the data in local memory while it's pulling sql records into Django querysets.
how does Django come into play with WOFpy?
The "lazy-apps" setting @lsetiawan has mentioned before, which fixed this issue with a postgresql backend, is probably specific to ngingx
, right Don? Assuming it is, maybe there's an equivalent setting/flag in Apache?
lazy-apps
is specific to uWSGI settings. Seems like Apache is usually paired with mod_wsgi
, at least from Flask documentation (http://flask.pocoo.org/docs/0.12/deploying/mod_wsgi/)
with the DAO, I would think, loading querysets with lots of SQL records. I'm kind of guessing with the RAM thing, it would need more testing to figure out if that is really happening. Actually all I'd need to do is pull a huge time series and watch top.
after you get the EOF error:
From here http://dev-odm2admin.cuahsi.org/wofpy/odm2timeseries/rest/1_1/GetVariableInfo?variable=odm2timeseries:DO%20Concentration you get:
<ns0:Fault xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope/">
<faultcode>soap11env:Server</faultcode>
<faultstring>
(sqlalchemy.exc.InvalidRequestError) Can't reconnect until invalid transaction is rolled back [SQL: u'SELECT DISTINCT ON (odm2.variables.variableid) odm2.timeseriesresultvalues.valueid AS odm2_timeseriesresultvalues_valueid, odm2.timeseriesresultvalues.resultid AS odm2_timeseriesresultvalues_resultid, odm2.timeseriesresultvalues.datavalue AS odm2_timeseriesresultvalues_datavalue, odm2.timeseriesresultvalues.valuedatetime AS odm2_timeseriesresultvalues_valuedatetime, odm2.timeseriesresultvalues.valuedatetimeutcoffset AS odm2_timeseriesresultvalues_valuedatetimeutcoffset, odm2.timeseriesresultvalues.censorcodecv AS odm2_timeseriesresultvalues_censorcodecv, odm2.timeseriesresultvalues.qualitycodecv AS odm2_timeseriesresultvalues_qualitycodecv, odm2.timeseriesresultvalues.timeaggregationinterval AS odm2_timeseriesresultvalues_timeaggregationinterval, odm2.timeseriesresultvalues.timeaggregationintervalunitsid AS odm2_timeseriesresultvalues_timeaggregationintervalunitsi_1 \nFROM odm2.timeseriesresultvalues JOIN (odm2.results JOIN odm2.timeseriesresults ON odm2.results.resultid = odm2.timeseriesresults.resultid) ON odm2.timeseriesresults.resultid = odm2.timeseriesresultvalues.resultid JOIN odm2.variables ON odm2.variables.variableid = odm2.results.variableid \nWHERE odm2.variables.variableid = odm2.results.variableid AND odm2.variables.variablecode = %(variablecode_1)s'] [parameters: [{}]]
</faultstring>
<faultactor/>
</ns0:Fault>
from here http://dev-odm2admin.cuahsi.org/wofpy/odm2timeseries/rest/1_1/GetSites?site=odm2timeseries:Rio%20Icacos%20Trib-IO you get:
<ns0:Fault xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope/">
<faultcode>soap11env:Server</faultcode>
<faultstring>Site odm2timeseries:Rio Icacos Trib-IO Not Found</faultstring>
<faultactor/>
</ns0:Fault>
from here http://dev-odm2admin.cuahsi.org/wofpy/odm2timeseries/rest/1_1/GetValues?location=odm2timeseries:Rio%20Icacos%20Trib-IO&variable=odm2timeseries:DO%20Concentration you get:
<ns0:Fault xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope/">
<faultcode>soap11env:Server</faultcode>
<faultstring>
Values Not Found for Rio Icacos Trib-IO:DO Concentration for dates None - None
</faultstring>
<faultactor/>
</ns0:Fault>
@miguelcleon if you restart WOFpy, what happens then?
It will work again.
I am currently using your ODM2LCZO Database for testing. I am not seeing the problem there so far.
prior to the EOF error I got a timeout error below. The problem doesn't seem to be reproducible unfortunately.
Fault: Fault(Server: "(psycopg2.DatabaseError) SSL SYSCALL error: Connection timed out\\n [SQL: 'SELECT DISTINCT odm2.sites.samplingfeatureid AS odm2_sites_samplingfeatureid, odm2.samplingfeatures.samplingfeatureid AS odm2_samplingfeatures_samplingfeatureid, odm2.sites.spatialreferenceid AS odm2_sites_spatialreferenceid, odm2.sites.sitetypecv AS odm2_sites_sitetypecv, odm2.sites.latitude AS odm2_sites_latitude, odm2.sites.longitude AS odm2_sites_longitude, odm2.samplingfeatures.samplingfeatureuuid AS odm2_samplingfeatures_samplingfeatureuuid, odm2.samplingfeatures.samplingfeaturetypecv AS odm2_samplingfeatures_samplingfeaturetypecv, odm2.samplingfeatures.samplingfeaturecode AS odm2_samplingfeatures_samplingfeaturecode, odm2.samplingfeatures.samplingfeaturename AS odm2_samplingfeatures_samplingfeaturename, odm2.samplingfeatures.samplingfeaturedescription AS odm2_samplingfeatures_samplingfeaturedescription, odm2.samplingfeatures.samplingfeaturegeotypecv AS odm2_samplingfeatures_samplingfeaturegeotypecv, odm2.samplingfeatures.elevation_m AS odm2_samplingfeatures_elevation_m, odm2.samplingfeatures.elevationdatumcv AS odm2_samplingfeatures_elevationdatumcv, odm2.samplingfeatures.featuregeometrywkt AS odm2_samplingfeatures_featuregeometrywkt, CASE WHEN (odm2.samplingfeatures.samplingfeaturetypecv = %(samplingfeaturetypecv_1)s) THEN %(param_1)s WHEN (odm2.samplingfeatures.samplingfeaturetypecv = %(samplingfeaturetypecv_2)s) THEN %(param_2)s ELSE %(param_3)s END AS _sa_polymorphic_on \\\\nFROM odm2.samplingfeatures JOIN odm2.sites ON odm2.samplingfeatures.samplingfeatureid = odm2.sites.samplingfeatureid JOIN odm2.featureactions ON odm2.samplingfeatures.samplingfeatureid = odm2.featureactions.samplingfeatureid JOIN (odm2.results JOIN odm2.timeseriesresults ON odm2.results.resultid = odm2.timeseriesresults.resultid) ON odm2.featureactions.featureactionid = odm2.results.featureactionid \\\\nWHERE odm2.featureactions.samplingfeatureid = odm2.sites.samplingfeatureid AND odm2.results.featureactionid = odm2.featureactions.featureactionid'] [parameters: {'param_1': 'Specimen', 'param_2': 'Site', 'samplingfeaturetypecv_2': 'Site', 'param_3': 'samplingfeatures', 'samplingfeaturetypecv_1': 'Specimen'}]")
I pulled the timeout error from the apache error log.
Are you using mod_wsgi
or uWSGI
?
mod_wsgi
I think you don't have a "graceful" reloading in place. So everytime you reload the browser, it's not killing the session, so eventually your database gets overwhelmed. lazy-apps
fixes that with uWSGI/NGINX setup. I am not sure what the equivalent is for mod_wsgi/Apache setup, unless you already figured it out and it's still not working.
ok, I'll look into that.
@miguelcleon I have hopefully provided some fix to your EOF problem.. Please try to deploy your WOFpy Server again with the latest copy of master and let me know if you encountered the error still. Thanks.
Seemingly at random, I'll get the below error. Then WOFpy stops working and I need to reload apache to get it working again. I've been trying to figure out a reproducible way to get this error but I haven't found it yet. If I do I'll update this issue. I was also going to post the 2nd error you get after this one but again because I can't reproduce it, now I'm not getting it. I'll add the second error when it happens again.