Open reidsunderland opened 5 months ago
It should only be looking for BUFR on the first few lines... basically there is an ahl... there may be a header, typically some numbers and carriage returns, and perhaps a control character... then the the data itself Starts with BUFR (or GRIB.)
It should not be any occurrence of bufr at the beginning of any line in a bulletin.
wait... this is in Sundew? wow... been there for 20 years, and nobody noticed? anyways... this stuff was ported to sr3, and it likely uses the same logic.
probably just doing splitline()[0:3] is enough to fix the problem in 99% of cases. A truly correct fix is harder.
Maybe it is looking at all lines because of collections? I don't really know much about collections or how Sundew handles them.
collections are always of the same type of bulletin. they are either all TAC (traditional alphanumeric code) or all binary.
The way to do collections with BUFR is just to catenate all the records together. so a collected BUFR would just start with one BUFR.
When a text bulletin contains a line that starts with "BUFR" (or "GRIB" or "\211PNG"), Sundew incorrectly determines that it's a binary bulletin, and truncates the data.
https://github.com/MetPX/Sundew/blob/efcb5601ef1af945a188a8825514339b53723572/lib/bulletin.py#L522-L529
Example input:
The log entry:
The data that gets ingested is:
https://dd.weather.gc.ca/bulletins/alphanumeric/20240515/NO/CWAO/16/NOCN04_CWAO_081718___01878