RDCEP / EDE

MIT License
2 stars 1 forks source link

write netcdf_etl.py analogous to shape_etl.py, not quite done though #1

Closed ghost closed 8 years ago

ghost commented 8 years ago

do not yet merge, need quite some review (see comments below):

implemented ingestion of netcdf into postgres using ogr2ogr. along that, implemented class to represent netcdf metadata, i.e. NetcdfMetadata, analogous to ShapeMetadata. not yet fully done with that class though

ghost commented 8 years ago

i used the sqlalchemy string type for some of the columns of the netcdf metadata table (see models.py) what still needs to be done (especially in models.py):

njmattes commented 8 years ago

Is it worth using arrays instead of comma-separated strings (eg dims_names, dims_lengths)—or is that just over-complicating things? Is there a reason for not inheriting from object in the NetcdfETL class?

ghost commented 8 years ago

yep, i overlooked the ARRAY, using it now... afaics there is no real reason why the NetcdfETL class (and many other classes which were there before) do not inherit from object...

ghost commented 8 years ago

there is indeed a problem with using ARRAY because dims_names, dims_lengths, ... are arrays of different lengths for different NetCDFS, generally, and very likely.

njmattes commented 8 years ago

Oh bummer. I didn't realize ARRAY required its values to be the same length. The various class methods like get_all_with_etl_status are executing commands in SQL strings rather than using the ORM because the tasks are under a different scoped session? Am I understanding that right?

njmattes commented 8 years ago

Also, I don't know that inheriting from object is a big deal, but it can cause issues with @property decorators in Python 2 among other things. Maybe you're using 3?

ghost commented 8 years ago

yes, these class methods, like get_all_with_etl_status use caller_session.execute instead of .query so theyre not using the ORM the reason being AFAICS that there are not yet classes written to express the returned colums of these queries. yes, you're right, the .execute statements in e.g. get_all_with_etl_status are run on a different scoped session than the .query calls in tasks.py (see also comment "A note on caller_sessionin..." in models.py (so i should have copy-pasted that comment also to NetcdfMetadata). it's using python 2 and has no @property decs, we're good, i'll just create an issue for that.