cybergreen-net / pm

Tech project management repo (issue tracker only)
2 stars 1 forks source link

RDS disk space issue #106

Open kxyne opened 7 years ago

kxyne commented 7 years ago

It looks like we're short on space in the current RDS instance, are there other DBs in it @zelima or do we need to reinstantiate with more disk?

Traceback (most recent call last): File "/home/cybergreen/etl2/venv/lib/python3.4/site-packages/luigi/worker.py", line 328, in check_complete is_complete = task.complete() File "/home/cybergreen/etl2/venv/lib/python3.4/site-packages/luigi/task.py", line 795, in complete return all(r.complete() for r in flatten(self.requires())) File "/home/cybergreen/etl2/cybergreen_data_pipeline.py", line 126, in requires aggregator.run() File "/home/cybergreen/etl2/aggregator/main.py", line 74, in run self.aggregate() File "/home/cybergreen/etl2/aggregator/main.py", line 206, in aggregate conn.execute(query) File "/home/cybergreen/etl2/venv/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 939, in execute return self._execute_text(object, multiparams, params) File "/home/cybergreen/etl2/venv/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1097, in _execute_text statement, parameters File "/home/cybergreen/etl2/venv/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context context) File "/home/cybergreen/etl2/venv/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1394, in _handle_dbapi_exception exc_info File "/home/cybergreen/etl2/venv/lib/python3.4/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause) File "/home/cybergreen/etl2/venv/lib/python3.4/site-packages/sqlalchemy/util/compat.py", line 186, in reraise raise value.with_traceback(tb) File "/home/cybergreen/etl2/venv/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context context) File "/home/cybergreen/etl2/venv/lib/python3.4/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute cursor.execute(statement, parameters) sqlalchemy.exc.InternalError: (psycopg2.InternalError) Disk Full DETAIL: ----------------------------------------------- error: Disk Full code: 1016 context: node: 0 query: 882976 location: fdisk_api.cpp:398 process: query0_20 [pid=16549] ----------------------------------------------- [SQL: "\nINSERT INTO count\n(SELECT\n date, risk, country, asn, count(*) as count, 0 as count_amplified\nFROM(\nSELECT DISTINCT (ip), date_trunc('day', date) AS date, risk, asn, country FROM logentry) AS foo\nGROUP BY date, asn, risk, country ORDER BY date DESC, country ASC, asn ASC, risk ASC)\n"] INFO: Informed scheduler that task RedShiftAggregation__99914b932b has status UNKNOWN
zelima commented 7 years ago

@kxyne This is not RDS but Redshift error and yes we may lack disk space there, as we currently have the default (smallest) Node type.

Capacity Details
Current Node Type - dc1.large
CPU 7 - EC2 Compute Units (2 virtual cores) per node
Memory - 15 GiB per node
Storage 160GB SSD - storage per node
I/O PerformanceI/O - Moderate
PlatformProcessor- 64-bit

this is the screenshot after today's run image

kxyne commented 7 years ago

Definitely the cause, however it seems to sit at 45% full all the time.

Is there another db on it? I'll spin up a new redshift for this run but we need to clean up any old datasets on it too.

Definitely the cause, however it seems to sit at 45% full all the time.

Is there another db on it? I'll spin up a new redshift for this run but we need to clean up any old datasets on it too.

diskfill

zelima commented 7 years ago

@kxyne My guess here is that first query was executed fine, which loads scanned data into logentry table and went out of space while second query was executing and exited with error. This left first table full. I checked right now the dev database and logentry table is full.

Aggregator script drops all tables before it starts load and after aggregated data is unloaded to S3 successfully.

I dropped table logentry manually and used disk space went towards 0

image

kxyne commented 7 years ago

Ah, so it's full because of the previous run, that makes sense, sorry, a bit tired :)