GEUS-Glaciology-and-Climate / pypromice

Process AWS data from L0 (raw logger) through Lx (end user)
https://pypromice.readthedocs.io
GNU General Public License v2.0
12 stars 4 forks source link

getL0tx sorts and writes new files for every station, regardless of transmission #114

Open patrickjwright opened 1 year ago

patrickjwright commented 1 year ago

In l3_processor.sh we are looking to process recently fetched files only wIth:

echo "Finding recently fetched files..."
IMEIs=$(find ./tx -maxdepth 1 -type f -mmin -60 | cut -d"/" -f3)

However, the sorting in getL0tx will write new files back to aws-l0/tx for every station, every time. Even if no duplicate lines were found, it still writes the newly sorted file back, and updates the modified time that -mmin looks at. Therefore, we are processing all stations each time whether or not new transmissions were received.

One possible solution: In getL0tx we can eaily keep track of which stations received a new transmission, and then run sortLines only for those stations. In this section:

        for k,v in aws.items():
            if str(imei) in k:
                if v[0] < d < v[1]:
                    print(f'AWS message for {k}.txt, {d.strftime("%Y-%m-%d %H:%M:%S")}')
                    l0 = L0tx(message, formatter_file, type_file)

Use k to strip out the station ID string for anything with a transmission (at the same level of the print, where we have proceeded past the if conditions) and append it to a rec_tx list. Then in the very next section write some logic to only run sortLines for these stations.

PennyHow commented 1 year ago

For reference, this is where in bin/getL0tx that sortLines is called:

https://github.com/GEUS-Glaciology-and-Climate/pypromice/blob/c5613a942d7e79089865f4f5837313bf6abe3461/bin/getL0tx#L137-L145