Open marklit opened 6 months ago
Hi Mark, thank you making this issue,
While I am in principle not opposed to having other formats of the data,
Before considering something like this, I need the files to have their ‘gaps’ accounted for.
As you know, when readsb restarts for any reason (configuration change being the most common) one readsb (let’s say -0) will go down while the other will keep running. Then once 0 is back 1 will go down and restart. This will result in a few minutes of unique data for each file, which is why they are both there.
So basically, I need to solve this problem first with the globe_history format before moving forward.
Make sense?
On Tue, 5 Mar 2024 at 09:47, Mark Litwintschik < @.***> wrote:
I built an ETL script that turns the current download into a parquet file. It has names for every field, is columnar-formatted so it is much quicker to query and it is compressed with ZStandard so a d DuckDuckGo removed one tracker. More https://duckduckgo.com/-uEgSxrlRvtp75UHOCEW7NotPVd3Be6g9xTrg0qR4_f0vQMLIEUum_juOmorLouK0Ii9oFFM9H44HytK0pZNQjIvI0221nv6wv0HbzC9G1c82W69vqPlG1i-eihth2nI0BBTbJQAbr2fxypS99oYYfA6duer_sJgmIwlnrrTq8oirkYzXX3xjDQ91LI3ee4NEP2l-h5jyMqsfXYvE36befDfOaJx18XlcIlUo-IzxS7VDpm1qOpZUVX0s6XCbwUncq217gcYezMtq8gp1owHU1XhK1e-3ACKoR9ZXRFVtojzLgK__9oBsKus2vp_GeaTiAJD3nvAvff5D--W7H60szcfLZY332MgEHS92GGKGf8QxK8DsMol2NvjXS3aKILvvKoLiM07SHOgRz9UpP5CRezWlQG2RZ_md0xsHLEj0h4mZ1f4NSPR7mciiwVL03K9VpscgtPTlTI1MiUceW98D07-vH6w2XddjKPe5jRmrNL6Xn8j75kDJZ2h8 Report Spam https://duckduckgo.com/-uEgSxrlRvtp75UHOCEW7NotPVd3Be6g9xTrg0qR4_f0vQMLIEUum_juOmorLouK0Ii9oFFM9H44HytK0pZNQjIvI0221nv6wv0HbzC9G1c82W69vqPlG1i-eihth2nI0BBTbJQAbr2fxypS99oYYfA6duer_sJgmIwlnrrTq8oirkYzXX3xjDQ91LI3ee4NEP2l-h5jyMqsfXYvE36befDfOaJx18XlcIlUo-IzxS7VDpm1qOpZUVX0s6XCbwUncq217gcYezMtq8gp1owHU1XhK1e-3ACKoR9ZXRFVtojzLgK__9oBsKus2vp_GeaTiAJD3nvAvff5D--W7H60szcfLZY332MgEHS92GGKGf8QxK8DsMol2NvjXS3aKILvvKoLiM07SHOgRz9UpP5CRezWlQG2RZ_md0xsHLEj0h4mZ1f4NSPR7mciiwVL03K9VpscgtPTlTI1MiUceW98D07-vH6w2XddjKPe5jRmrNL6Xn8j75kDJZ2h8
I built an ETL script that turns the current download into a parquet file. It has names for every field, is columnar-formatted so it is much quicker to query and it is compressed with ZStandard so a day's worth of data is still around 1.2 GB.
https://tech.marksblogg.com/global-flight-tracking-adsb.html
Is there any chance the above ETL script could work its way into your infrastructure and produce a daily Parquet file in addition to the current daily download tar file?
— Reply to this email directly, view it on GitHub https://github.com/adsblol/globe_history_2024/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAM553LXP35IMWZ74DYTJETYWWBCNAVCNFSM6AAAAABEGWDOTSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE3DQNRTG4YTGOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hey nice blog post! :)
If you're gonna make such a nice new format you should include info if the airplane is on the ground.
'altitude':
trace[3]
if str(trace[3]).strip().lower() != 'ground'
else None,
I didn't see that saved anywhere. Possibly just a bool in your scheme?
You probably already referenced it while using the data, but here is some explanation on the format: https://github.com/wiedehopf/readsb/blob/dev/README-json.md#trace-jsons The aircraft object is only present for every 4th point but i assume you didn't need much data from there / your DB scheme handles that somehow.
Also sorry for the format, it's a bit of a mess.
@marklit of course nothing is preventing you from tackling this project yourself and making the parquet-ready data available similar to this repo. :)
@marklit, I've created a ClickHouse database with the data and also added ADSB-E: https://github.com/ClickHouse/adsb.exposed/ Connect me if there are further ideas.
I built an ETL script that turns the current download into a parquet file. It has names for every field, is columnar-formatted so it is much quicker to query and it is compressed with ZStandard so a day's worth of data is still around 1.2 GB. There is also H3 indices which help filter specific geographies quickly.
https://tech.marksblogg.com/global-flight-tracking-adsb.html
Is there any chance the above ETL script could work its way into your infrastructure and produce a daily Parquet file in addition to the current daily download tar file?