EpistasisLab / Aliro

Aliro: AI-Driven Data Science
https://epistasislab.github.io/Aliro
GNU General Public License v3.0
223 stars 63 forks source link

Dataset file size #587

Open jay-m-dev opened 1 year ago

jay-m-dev commented 1 year ago

Loading large datasets (about 150MB) fails at the get_metafeatures step with an unknown error, this error is raised when uploading on both the GUI and the data/datasets/user directory.

alirogpt-lab-1 | 0|lab | child process exited with code null alirogpt-lab-1 | 0|lab | Error, pythonProcessAsync process exited with status undefined, args: 'ai/metalearning/get_metafeatures.py,649b30d19a7e2b0140231744,-target,target,-identifier_type,fileid,-prediction_type,classification', stderr: 'null', stdout: 'null' alirogpt-lab-1 | 0|lab | Error: Error, pythonProcessAsync process exited with status undefined, args: 'ai/metalearning/get_metafeatures.py,649b30d19a7e2b0140231744,-target,target,-identifier_type,fileid,-prediction_type,classification', stderr: 'null', stdout: 'null' alirogpt-lab-1 | 0|lab | at ChildProcess. (/appsrc/lab/pyutils.js:200:11) alirogpt-lab-1 | 0|lab | at ChildProcess.emit (node:events:513:28) alirogpt-lab-1 | 0|lab | at maybeClose (node:internal/child_process:1100:16) alirogpt-lab-1 | 0|lab | at Socket. (node:internal/child_process:458:11) alirogpt-lab-1 | 0|lab | at Socket.emit (node:events:513:28) alirogpt-lab-1 | 0|lab | at Pipe. (node:net:301:12) alirogpt-lab-1 | 1|ai | ai: INFO: 2023 06:57:46 PM UTC: checking results... alirogpt-lab-1 | PM2 | App [lab:0] exited with code [1] via signal [SIGINT] alirogpt-lab-1 | PM2 | App [lab:0] starting in -fork mode- alirogpt-lab-1 | 1|ai | api_utils: ERROR: Unexpected error in LabApi.request for path 'POST:http://lab:5080/api/experiments':<class 'requests.exceptions.ConnectionError'> alirogpt-lab-1 | 1|ai | ai: ERROR: Unhanded exception caught: <class 'requests.exceptions.ConnectionError'> alirogpt-lab-1 | 1|ai | ai: INFO: Shutting down AI engine... alirogpt-lab-1 | 1|ai | ai: INFO: ...Shutting down Request Manager... alirogpt-lab-1 | 1|ai | ai: INFO: Goodbye alirogpt-lab-1 | 1|ai | Traceback (most recent call last): alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 710, in urlopen alirogpt-lab-1 | 1|ai | chunked=chunked, alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 449, in _make_request alirogpt-lab-1 | 1|ai | six.raise_from(e, None) alirogpt-lab-1 | 1|ai | File "", line 3, in raise_from alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 444, in _make_request alirogpt-lab-1 | 1|ai | httplib_response = conn.getresponse() alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/http/client.py", line 1373, in getresponse alirogpt-lab-1 | 1|ai | response.begin() alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/http/client.py", line 319, in begin alirogpt-lab-1 | 1|ai | version, status, reason = self._read_status() alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/http/client.py", line 280, in _read_status alirogpt-lab-1 | 1|ai | line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/socket.py", line 589, in readinto alirogpt-lab-1 | 1|ai | return self._sock.recv_into(b) alirogpt-lab-1 | 1|ai | ConnectionResetError: [Errno 104] Connection reset by peer alirogpt-lab-1 | 1|ai | During handling of the above exception, another exception occurred: alirogpt-lab-1 | 1|ai | Traceback (most recent call last): alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send alirogpt-lab-1 | 1|ai | timeout=timeout alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 788, in urlopen alirogpt-lab-1 | 1|ai | method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 550, in increment alirogpt-lab-1 | 1|ai | raise six.reraise(type(error), error, _stacktrace) alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 769, in reraise alirogpt-lab-1 | 1|ai | raise value.with_traceback(tb) alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 710, in urlopen alirogpt-lab-1 | 1|ai | chunked=chunked, alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 449, in _make_request alirogpt-lab-1 | 1|ai | six.raise_from(e, None) alirogpt-lab-1 | 1|ai | File "", line 3, in raise_from alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 444, in _make_request alirogpt-lab-1 | 1|ai | httplib_response = conn.getresponse() alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/http/client.py", line 1373, in getresponse alirogpt-lab-1 | 1|ai | response.begin() alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/http/client.py", line 319, in begin alirogpt-lab-1 | 1|ai | version, status, reason = self._read_status() alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/http/client.py", line 280, in _read_status alirogpt-lab-1 | 1|ai | line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/socket.py", line 589, in readinto alirogpt-lab-1 | 1|ai | return self._sock.recv_into(b) alirogpt-lab-1 | 1|ai | urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')) alirogpt-lab-1 | 1|ai | During handling of the above exception, another exception occurred: alirogpt-lab-1 | 1|ai | Traceback (most recent call last): alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main alirogpt-lab-1 | 1|ai | "main", mod_spec) alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code alirogpt-lab-1 | 1|ai | exec(code, run_globals) alirogpt-lab-1 | 1|ai | File "/appsrc/ai/ai.py", line 661, in alirogpt-lab-1 | 1|ai | main() alirogpt-lab-1 | 1|ai | File "/appsrc/ai/ai.py", line 636, in main alirogpt-lab-1 | 1|ai | if pennai.check_results(): alirogpt-lab-1 | 1|ai | File "/appsrc/ai/ai.py", line 366, in check_results alirogpt-lab-1 | 1|ai | last_update=self.last_update) alirogpt-lab-1 | 1|ai | File "/appsrc/ai/api_utils.py", line 236, in get_new_experiments_as_dataframe alirogpt-lab-1 | 1|ai | data = self.get_new_experiments(last_update) alirogpt-lab-1 | 1|ai | File "/appsrc/ai/api_utils.py", line 221, in get_new_experiments alirogpt-lab-1 | 1|ai | res = self.request(path=self.exp_path, payload=payload) alirogpt-lab-1 | 1|ai | File "/appsrc/ai/api_utils.py", line 482, in __request alirogpt-lab-1 | 1|ai | headers=headers) alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/requests/api.py", line 61, in request alirogpt-lab-1 | 1|ai | return session.request(method=method, url=url, kwargs) alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 542, in request alirogpt-lab-1 | 1|ai | resp = self.send(prep, send_kwargs) alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 655, in send alirogpt-lab-1 | 1|ai | r = adapter.send(request, **kwargs) alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 498, in send alirogpt-lab-1 | 1|ai | raise ConnectionError(err, request=request) alirogpt-lab-1 | 1|ai | requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

jay-m-dev commented 1 year ago

The largest file I've been able to upload successfully via the GUI is 144MB. Using the data/datasets/user directory I've been able to upload a 175MB file. One thing I noted in the error above is that for the get_metafeatures script, the -identifier_type parameter is set to fileid, even when uploading directly from the data/datasets/user directory. The option to use filepath exists but it looks like it's not used. Perhaps making use of the filepath would allow large datasets to be uploaded directly.