david-fisher / 320-F19-Track-I

Track I's group repo
BSD 3-Clause "New" or "Revised" License
3 stars 1 forks source link

Implement Automatic Uploads to Hobodata Table #83

Open GitThePower opened 4 years ago

GitThePower commented 4 years ago

Data is already streaming into the s3 and getting parsed. A function needs to be implemented to upload the data to the data table whenever it comes in. Additionally, either change the format of how the data gets parsed to suit the data table or (Recommended) change the format of the data table to fit how the data gets parsed)

alisakotliarova commented 4 years ago

Hobolink offers an option where we could have information delivered on a schedule to an FTP/SFTP server instead. AWS Transfer is a fully-managed SFTP service that will allow us to easily stream this data into S3. A Lambda function will then parse this data, write it to our database, and then delete the file from S3.

dergoose commented 4 years ago

Isn’t this the implementation we currently have?

david-fisher commented 4 years ago

Current is email data to some gmail address, followed by parse on the S3 and then maybe store (Henry's workaround and this original issue). We did not go with the sftp transfer option because of credential issues and some problem Cole couldn't solve that I don't remember that was supposed to get documented.

Dan Cooley asked a colleague at Cornell:

In the meantime, the undergrads have had a hard time getting data in and out of the Onset server. There’s something about the API that isn’t working for them. I mentioned to the instructor, David Fisher, that the Northeast Reg. Climate Center seemed to have solved that problem, at least the problem of getting data. He asked if you could tell us how you did it. So, here I am, asking if that’s possible.

I will update as I get more information from them.

alisakotliarova commented 4 years ago

I assumed what the current process was based on issue #82 which goes into detail the email forwarding method that David described.

david-fisher commented 4 years ago

Apparently, unbeknownst to us all (except Veronica), hobolink.com was connecting to a VERY expensive Transfer/SFTP server and uploading data on an hourly basis. Since the uploaded data is never processed or deleted, it has just piled up, taking excessive storage. Unfortunately, Paul was not monitoring the billing console.

On reviewing the hobo documentation, the Onset weather station is an RX model. It can not be accessed with the API, the only way to get its exported data is via SFTP or email. So, tech support told Veronica the truth. Stupid software tricks.

Right now, the SFTP server is being disabled because it is too expensive. I have turned off automatic uploads from the hobolink.com account.

Quoting from their page:

With an RX3000 or RX2100 station, data uploaded to HOBOlink is saved to a database, with data points saved at each logging interval. This gives you the flexibility to choose the precise sensor data you want from any point or any number of stations in a custom export that you can view in other programs or share with others. For example, you can set up an export that includes all sensor data or one that has only smart sensor data. Or, you can set up one export that has just temperature data and another that has just barometric pressure data. There are numerous ways to customize your data. You can set up or access existing exports from the Exports tab on the device page or by clicking Data and then Exports (see Exporting Data). You can also have it automatically delivered to any email address or FTP/SFTP location on a schedule that you select (see Scheduling Data Delivery).

Important: It is recommended that you use scheduled data downloads to archive all the data in a safe location as a backup. Consider setting up an export that includes all sensor data (see Exporting Data) and is delivered to an email address or FTP/SFTP location on a regular schedule (see Scheduling Data Delivery). You can then keep those files to retain a full data history as needed.

david-fisher commented 4 years ago

To give you an idea, the one week download size is 860 KB (duh): dfisher@wpa199 ~ % ls -l Downloads/*.csv -rw-r--r--@ 1 dfisher staff 857706 Feb 27 11:22 Downloads/20699245___Over_the_last_week_2020_02_27_16_22_29_UTC_1.csv

Clearly we need to be cleaning these up after the records get inserted into the DB.