CroatianMeteorNetwork / RMS

RPi Meteor Station
https://globalmeteornetwork.org/
GNU General Public License v3.0
169 stars 47 forks source link

FITS Handling: Introduce Lossless Compression for HDUs #278

Open Cybis320 opened 3 months ago

Cybis320 commented 3 months ago

This PR introduces lossless compression algorithms for Header/Data Units (HDUs) within our FITS file handling module ('RMS/Formats/FFfits.py'). The goal is to reduce FITS file sizes to improve storage efficiency, mitigate wear and tear on storage devices, and enhance data transfer speeds, all while maintaining the data integrity and accessibility.

Changes Made:

Impact:

Testing:

g7gpr commented 3 months ago

Is this expected to be safe to run on stations that will upload, or should upload be turned off when testing this?

markmac99 commented 3 months ago

can you please share some example compressed files ? I'd like to check they're still compatible with other FITS viewing and analysis software,

Cybis320 commented 3 months ago

Here are four FF fits files. Unfortunately, It's cloudy here right now. It would interesting to see what the size will be on a starry night - the size reduction might not be as dramatic.

  1. 1080p Uncompressed: 8.3 MB
  2. 1080p Compressed: 3.4 MB
  3. 720p Uncompressed: 3.7 MB
  4. 720p Compressed: 1.7 MB

[ Archive 2.zip ]

dvida commented 3 months ago

Dave, I suggest testing this separately until we can confirm that there are no effects on processing.

markmac99 commented 3 months ago

As noted by email i have found that the compressed files are incompatible with other FITS handling software - both FITS Liberator and Pixinsight refuse to open them anyway (they actually crash FL!). For me this feels like an issue as we don't know what the files might be getting used for downstream of RMS.

I'm also wondering whether compression is really beneficial. Storage is cheap and 3.5MB is not really that big and the data are bzip compressed for upload to GMN so I dont think it will save space on the GMN side or save time in uploads. I agree it means less data getting written to disk each night, but the lifespan of any decent SD card is pretty long (several years) and i've actually never had a card fail due to wear and tear.

Cybis320 commented 3 months ago

On my end, CMNbinViewer, SAOImageDS9, and FITS Liberator have no issues handling it. I don't have access to Pixinsight but Pixinsight stated 'we have no interest in FITS, which we deprecated many years ago in PixInsight'.

Is anyone else having any issues? It would be strange for a Fits application to not support such a basic requirement.

Mark, you're making a very good point about the data being compressed before upload. The compressed and the uncompressed fits files, once compressed in a tar.bz2, have similar sizes. So, there is indeed no benefit as far transmitting the files. I don't know how the tar files are handled on the receiving end.

Regarding local storage, it would clearly be beneficial. So, I think it would be valuable to get to the bottom the compatibility issue before pulling this out.

Luc

As noted by email i have found that the compressed files are incompatible with other FITS handling software - both FITS Liberator and Pixinsight refuse to open them anyway (they actually crash FL!). For me this feels like an issue as we don't know what the files might be getting used for downstream of RMS.

I'm also wondering whether compression is really beneficial. Storage is cheap and 3.5MB is not really that big and the data are bzip compressed for upload to GMN so I dont think it will save space on the GMN side or save time in uploads. I agree it means less data getting written to disk each night, but the lifespan of any decent SD card is pretty long (several years) and i've actually never had a card fail due to wear and tear.

markmac99 commented 3 months ago

Pixinsight will keep supporting FITS for a long time, although the pixinsight guys have been pushing their own private format for years. Nobody else is interested in it! My point was that PixInsight is a -very- widely used tool in the astro-imaging world, and if it can't open the files then its a problem.

i think the problem with Fits Liberator is that there are two versions: version 3 is distributed by ESA from their website and can't open the compressed files. Version 4 from noirlab can handle the files.

Personally i actually don't agree there's much advantage in compressing the files, because storage is cheap and the days when 3-4MB was a big file are long gone. Saving 1-1.5MB per file isn't really very significant. I realise it'd mean we could keep the CapturedFiles data for a bit longer but its very rare that we need to look back more than a few days, and realistically we'd only be able to keep an extra day or so.

Anyway for me, i would want this to be an optional feature.

g7gpr commented 3 months ago

OK, will do.


From: Denis Vida @.> Sent: Tuesday, March 26, 2024 9:16 PM To: CroatianMeteorNetwork/RMS @.> Cc: David Rollinson @.>; Review requested @.> Subject: Re: [CroatianMeteorNetwork/RMS] FITS Handling: Introduce Lossless Compression for HDUs (PR #278)

Dave, I suggest testing this separately until we can confirm that there are no effects on processing.

— Reply to this email directly, view it on GitHubhttps://github.com/CroatianMeteorNetwork/RMS/pull/278#issuecomment-2020404913, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASMOVH6CBPCC24ZFXXZRZM3Y2FYKXAVCNFSM6AAAAABFHXYEZWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRQGQYDIOJRGM. You are receiving this because your review was requested.Message ID: @.***>

dvida commented 3 months ago

Hi all, Thank you for the thorough discussion. I agree about making this optional and releasing it to the community. We should announce this new feature and see what kind of feedback we get. If everyone is happy, we can make it the default. Squeezing a few more days of FF files could be very useful, as sometimes we're late to react and the data is gone forever. Any increases in this time window are very welcome. @Cybis320, could you add it as an option in the config file?

Cybis320 commented 3 months ago

I added a 'hdu_compress' configuration setting to .config [Compression] section (defaults to False). And I made a command line utility to convert all FITS files in a directory to uncompressed FITS. RMS/Utils/ConvertCompressedFits.py

markmac99 commented 3 months ago

will try it out !

Cybis320 commented 3 months ago

Update on compression ratio: my camera sees stars tonight and the compression still reduces the size by half.

dario-zubovic commented 2 weeks ago

Worth noting that AstroPy also supports full-file compression with gzip, zip or bzip2. Such files have .fits.gz/.fits.zip/.fits.bz2 extension.

IMO it should be preferred approach:

As for justifiably question, I would agree that reducing size of individual files will be helpful. Besides retaining more nights on local storage before they're cleaned, it would also help will lifespan of SD cards, as they can only survive a limited number of write cycles.

markmac99 commented 2 weeks ago

As i've said earlier i do not see any real benefit to compressing the FITS files, and so would prefer this to be an option and not the default. I understand that it'd save a bit of space but as I've noted before diskspace is cheap and its rare that we need to recover raw data from more than a few days ago, especially now we have the EventMonitor in place and can recover data in near-realtime without camera-operator intervention. I understand the point about less wear and tear, but in my experience this isn't really a significant problem plus many owners are migrating to linux and/or SSD which probably mitigates that issue anyway.

Finally bear in mind that RMS is not the only app that reads/writes the FITS files. If we change the file format it could have unexpected impact on how end-users process the data in ways we cannot guess because we do not know what downstream processes the individual countrry networks have in place (we in UKMON don't use the FITS files, but its possible others do). I'm sure it'd be a simple fix, but i am never in favour of introducing incompatability.

adalava commented 2 weeks ago

In my use case, reduce storage size in ~50% is benefical. Storage isn't cheap here, and it would allow us to use 64GB or even 32GB microSD cards on the Pis.

It seems that required I/O write throughtput will be reduced as well, so we will be able to add more RMS instances on a x86_64 server, at cost of higher CPU usage.

I really like this patch. But I'd suggest you to collect more data. For instance, I'd like to see how extra CPU usage will affect the PIs, specially the Pi3 as they are at maximum load already, before making it default. The compatibility with software like Mark said is also important. Some people would prefer to have compression disabled for now, so they need to be aware and choose what to. I think most people don't use any external software, so I'm in favor in having it as default as long it doesn't break any existing RMS station.

I'll be happy to test it on BR0001, BR0002 and BR0003

dario-zubovic commented 2 weeks ago

To clarify what I meant above; it's not required to change file format to get benefits of compression. As such, I would be argue against this patch with current implementation of HDU-only compression.

Overall idea makes sense though. AstroPy can directly write to disk files with current FITS format, but already inside a generic compressed archive. Ie. best of both worlds, no changes to underlying format and smaller individual files, It will result in better compression ratio as well. I can open a branch if anybody is curious to test full-file compression?

adalava commented 2 weeks ago

To clarify what I meant above; it's not required to change file format to get benefits of compression. As such, I would be argue against this patch with current implementation of HDU-only compression.

Overall idea makes sense though. AstroPy can directly write to disk files with current FITS format, but already inside a generic compressed archive. Ie. best of both worlds, no changes to underlying format and smaller individual files, It will result in better compression ratio as well. I can open a branch if anybody is curious to test full-file compression?

Hm! Yes, It's a good balance if it can write the compressed file directly (and not by writting the full uncompressed file, then compress it and delete the full file). Also would be great if the other tools like skyfit, bin viewer, stars detection and so are able to uncompress it directly to the RAM, without expanding it to the disk first.

I would like to test it as well!

markmac99 commented 2 weeks ago

To clarify what I meant above; it's not required to change file format to get benefits of compression. As such, I would be argue against this patch with current implementation of HDU-only compression. Overall idea makes sense though. AstroPy can directly write to disk files with current FITS format, but already inside a generic compressed archive. Ie. best of both worlds, no changes to underlying format and smaller individual files, It will result in better compression ratio as well. I can open a branch if anybody is curious to test full-file compression?

Hm! Yes, It's a good balance if it can write the compressed file directly (and not by writting the full uncompressed file, then compress it and delete the full file). Also would be great if the other tools like skyfit, bin viewer, stars detection and so are able to uncompress it directly to the RAM, without expanding it to the disk first.

I would like to test it as well!

Would be interested to test it too. Agree we should not change the file format. I'd also argue strongly for using a common compression algo like gzip or zip to ensure portability across platforms. I know there are better algos, but they are sometimes not supported across windows, macos, and various linux flavours. Definitely needs to be tested on Pi3 as it might require too much memory or processor capacity. While the pi3 is now very old there are still a lot of RMS stations using them. I don't have any working Pi3 station however :(