MetPX / Sundew

WMO-GTS switch on TCP/IP protocols (largely obsolete, legacy code.)
GNU General Public License v2.0
5 stars 3 forks source link

bulletin-file receiver incorrectly detects text bulletin as a BUFR #20

Open reidsunderland opened 5 months ago

reidsunderland commented 5 months ago

When a text bulletin contains a line that starts with "BUFR" (or "GRIB" or "\211PNG"), Sundew incorrectly determines that it's a binary bulletin, and truncates the data.

https://github.com/MetPX/Sundew/blob/efcb5601ef1af945a188a8825514339b53723572/lib/bulletin.py#L522-L529

Example input:

NOCN04 CWAO 081718

GENOT TLTP. NO. 008 CCC

BILINGUAL MESSAGE - FRENCH TEXT TO FOLLOW ENGLISH TEXT
MESSAGE BILINGUE - LE TEXTE FRANCAIS SUIT LE TEXTE ANGLAIS

EFFECTIVE JUNE 25, 2024, THE METEOROLOGICAL SERVICE OF CANADA WILL
BEGIN REGULARLY TRANSMITTING NEW BULLETINS OF HOURLY AUTOMATED
SURFACE WEATHER OBSERVATIONS FROM RECRUITED CANADIAN SHIPS IN THE
BUFR FORMAT. THESE MESSAGES WILL FOLLOW WMO BUFR TEMPLATE TM308014.

THE BULLETIN HEADERS FOR HOURLY SHIP OBSERVATIONS WILL BE:
ISSA20 CWAO ISSB20 CWAO
ISSC20 CWAO
ISSD20 CWAO
ISSI20 CWAO
ISSJ20 CWAO
ISSK20 CWAO ISSL20 CWAO

THE BULLETIN HEADERS FOR SHIP OBSERVATIONS TRANSMITTING INTRA-HOUR
SIGNIFICANT WEATHER (STORM AND SPREP) OBSERVATIONS WILL BE:
ISSA40 CWAO ISSB40 CWAO
ISSC40 CWAO
ISSD40 CWAO
ISSI40 CWAO
ISSJ40 CWAO
ISSK40 CWAO ISSL40 CWAO

THE FOLLOWING ARE THE STATION IDENTIFIERS FOR SHIPS CURRENTLY 
RECRUITED INTO THE PROGRAMME AND DISSEMINATED UNDER THE HEADERS 
LISTED ABOVE. THE IDENTIFIER USED IS THE SHIP OBERVATIONS TEAM 
(SOT) IDENTIFIER.

XMD7RGF UWBR5ZS DXY62VX DXJTY4L 6AK8NVU MFTW9ZM USFAPDJ FVU8WJS 
VCHQSQP 8QVU8QS ADC9EHA SDTNUWW SLTUAJL SGGQ4QX HUGT78Q 2AZY7HU 
HCW6ZCH ZJMN7RS PZBN6JW 2CXJVYJ ZTNDLFM YBVEWGM 6WUKLPV BAY6U5W 
VKCX8TW GYJN8YD 7SXVXXY YRQAXKE 9BEUB6Y 2QDZMLH KYGEJUP 8UWN4HQ 
QCQAY5M NKEKW8S 4QUWBFR 7VHNUTA LUARZ8N HPMEYWQ NHXMUSA 2MXEY3K
QSHL7RV 8VEUMXY

FOR ADDITIONAL INFORMATION REGARDING THIS BULLETIN PLEASE
CONTACT THE FOLLOWING EMAIL:
SYSTEMEDEGESTIONDESDONNEES-DATAMANAGEMENTSYSTEM(AT)EC.GC.CA

-------------------------------------------------------------------

DES LE 25 JUIN 2024, LE SERVICE METEOROLOGIQUE DU CANADA COMMENCERA
A EMETTRE REGULIEREMENT DE NOUVEAUX BULLETINS D OBSERVATIONS DE
SURFACE HORAIRES A PARTIR DES NAVIRES CANADIENS DANS LE FORMAT 
BUFR. CES MESSAGES RESPECTERONT LE GABARIT TM308014 DU FORMAT BUFR 
DE L OMM. LES EN-TETES DU BULLETIN POUR LES OBSERVATIONS HORAIRES 
DU NAVIRE SERONT:
ISSA20 CWAO
ISSB20 CWAO
ISSC20 CWAO
ISSD20 CWAO
ISSI20 CWAO
ISSJ20 CWAO
ISSK20 CWAO
ISSL20 CWAO
LES EN-TETES DU BULLETIN POUR LES OBSERVATIONS DES NAVIRES POUR DES
CONDITIONS METEOROLOGIQUES DANS L HEURE (TEMPETES ET SPREP) SERONT:
ISSA40 CWAO
ISSB40 CWAO
ISSC40 CWAO
ISSD40 CWAO
ISSI40 CWAO
ISSJ40 CWAO
ISSK40 CWAO
ISSL40 CWAO

CE QUI SUIT SONT LES IDENTIFIANTS DE STATION POUR LES NAVIRES 
ACTUELLEMENT RECRUTES DANS LE PROGRAMME ET DIFFUSES SOUS LES 
EN-TETES ENUMERES CI-DESSUS. L IDENTIFIANT UTILISE EST CELUI DE L 
EQUIPE D OBERVATIONS DU NAVIRE (SOT) IDENTIFIANT:

XMD7RGF UWBR5ZS DXY62VX DXJTY4L 6AK8NVU MFTW9ZM USFAPDJ FVU8WJS 
VCHQSQP 8QVU8QS ADC9EHA SDTNUWW SLTUAJL SGGQ4QX HUGT78Q 2AZY7HU 
HCW6ZCH ZJMN7RS PZBN6JW 2CXJVYJ ZTNDLFM YBVEWGM 6WUKLPV BAY6U5W 
VKCX8TW GYJN8YD 7SXVXXY YRQAXKE 9BEUB6Y 2QDZMLH KYGEJUP 8UWN4HQ 
QCQAY5M NKEKW8S 4QUWBFR 7VHNUTA LUARZ8N HPMEYWQ NHXMUSA 2MXEY3K
QSHL7RV 8VEUMXY

POUR PLUS D INFORMATIONS CONCERNANT LE PRESENT BULLETIN,
VEUILLEZ ECRIRE A L ADRESSE SUIVANTE :
SYSTEMEDEGESTIONDESDONNEES-DATAMANAGEMENTSYSTEM(AT)EC.GC.CA

EFFECTIVE / EN VIGUEUR - MAY / 31 MAI 2024 0000 UTC/TU

SIEWE ADM-MSC / SMA-SMC OTTAWA

The log entry:

2024-05-15 16:35:22,591 [INFO] 1 bulletins will be ingested
2024-05-15 16:35:22,591 [WARNING] Bufr without a valid internal date in section 1
2024-05-15 16:35:22,591 [WARNING] Use date from bulletin header
2024-05-15 16:35:22,595 [INFO] (2803 Bytes) Ingested in DB as /var/spool/px/db/20240515/NO/cmcin/CWAO/NOCN04_CWAO_081718___01878:cmcin:CWAO:NO:1:Direct:20240515163522

The data that gets ingested is:

NOCN04 CWAO 081718

GENOT TLTP. NO. 008 CCC

BILINGUAL MESSAGE - FRENCH TEXT TO FOLLOW ENGLISH TEXT
MESSAGE BILINGUE - LE TEXTE FRANCAIS SUIT LE TEXTE ANGLAIS

EFFECTIVE JUNE 25, 2024, THE METEOROLOGICAL SERVICE OF CANADA WILL
BEGIN REGULARLY TRANSMITTING NEW BULLETINS OF HOURLY AUTOMATED
SURFACE WEATHER OBSERVATIONS FROM RECRUITED CANADIAN SHIPS IN THE
BUFR FORMAT. THESE MESSAGES WILL FOLLOW WMO BUFR TEMPLATE TM308014.

THE BULLETIN HEADERS FOR HOURLY SHIP OBSERVATIONS WILL BE:
ISSA20 CWAO ISSB20 CWAO
ISSC20 CWAO
ISSD20 CWAO
ISSI20 CWAO
ISSJ20 CWAO
ISSK20 CWAO ISSL20 CWAO

THE BULLETIN HEADERS FOR SHIP OBSERVATIONS TRANSMITTING INTRA-HOUR
SIGNIFICANT WEATHER (STORM AND SPREP) OBSERVATIONS WILL BE:
ISSA40 CWAO ISSB40 CWAO
ISSC40 CWAO
ISSD40 CWAO
ISSI40 CWAO
ISSJ40 CWAO
ISSK40 CWAO ISSL40 CWAO

THE FOLLOWING ARE THE STATION IDENTIFIERS FOR SHIPS CURRENTLY 
RECRUITED INTO THE PROGRAMME AND DISSEMINATED UNDER THE HEADERS 
LISTED ABOVE. THE IDENTIFIER USED IS THE SHIP OBERVATIONS TEAM 
(SOT) IDENTIFIER.

XMD7RGF UWBR5ZS DXY62VX DXJTY4L 6AK8NVU MFTW9ZM USFAPDJ FVU8WJS 
VCHQSQP 8QVU8QS ADC9EHA SDTNUWW SLTUAJL SGGQ4QX HUGT78Q 2AZY7HU 
HCW6ZCH ZJMN7RS PZBN6JW 2CXJVYJ ZTNDLFM YBVEWGM 6WUKLPV BAY6U5W 
VKCX8TW GYJN8YD 7SXVXXY YRQAXKE 9BEUB6Y 2QDZMLH KYGEJUP 8UWN4HQ 
QCQAY5M NKEKW8S 4QUWBFR 7VHNUTA LUARZ8N HPMEYWQ NHXMUSA 2MXEY3K
QSHL7RV 8VEUMXY

FOR ADDITIONAL INFORMATION REGARDING THIS BULLETIN PLEASE
CONTACT THE FOLLOWING EMAIL:
SYSTEMEDEGESTIONDESDONNEES-DATAMANAGEMENTSYSTEM(AT)EC.GC.CA

-------------------------------------------------------------------

DES LE 25 JUIN 2024, LE SERVICE METEOROLOGIQUE DU CANADA COMMENCERA
A EMETTRE REGULIEREMENT DE NOUVEAUX BULLETINS D OBSERVATIONS DE
SURFACE HORAIRES A PARTIR DES NAVIRES CANADIENS DANS LE FORMAT 
BUFR. CES MESSAGES RESPECTERONT LE GABARIT TM308014 DU FORMAT BUFR 
DE L OMM. LES EN-TETES DU BULLETIN POUR LES OBSERVATIONS HORAIRES 
DU NAVIRE SERONT:
ISSA20 CWAO
ISSB20 CWAO
ISSC20 CWAO
ISSD20 CWAO
ISSI20 CWAO
ISSJ20 CWAO
ISSK20 CWAO
ISSL20 CWAO
LES EN-TETES DU BULLETIN POUR LES OBSERVATIONS DES NAVIRES POUR DES
CONDITIONS METEOROLOGIQUES DANS L HEURE (TEMPETES ET SPREP) SERONT:
ISSA40 CWAO
ISSB40 CWAO
ISSC40 CWAO
ISSD40 CWAO
ISSI40 CWAO
ISSJ40 CWAO
ISSK40 CWAO
ISSL40 CWAO

CE QUI SUIT SONT LES IDENTIFIANTS DE STATION POUR LES NAVIRES 
ACTUELLEMENT RECRUTES DANS LE PROGRAMME ET DIFFUSES SOUS LES 
EN-TETES ENUMERES CI-DESSUS. L IDENTIFIANT UTILISE EST CELUI DE L 
EQUIPE D OBERVATIONS DU NAVIRE (SOT) IDENTIFIANT:

XMD7RGF UWBR5ZS DXY62VX DXJTY4L 6AK8NVU MFTW9ZM USFAPDJ FVU8WJS 
VCHQSQP 8QVU8QS ADC9EHA SDTNUWW SLTUAJL SGGQ4QX HUGT78Q 2AZY7HU 
HCW6ZCH ZJMN7RS PZBN6JW 2CXJVYJ ZTNDLFM YBVEWGM 6WUKLPV BAY6U5W 
VKCX8TW GYJN8YD 7SXVXXY YRQAXKE 9

https://dd.weather.gc.ca/bulletins/alphanumeric/20240515/NO/CWAO/16/NOCN04_CWAO_081718___01878

petersilva commented 5 months ago

It should only be looking for BUFR on the first few lines... basically there is an ahl... there may be a header, typically some numbers and carriage returns, and perhaps a control character... then the the data itself Starts with BUFR (or GRIB.)

It should not be any occurrence of bufr at the beginning of any line in a bulletin.

petersilva commented 5 months ago

wait... this is in Sundew? wow... been there for 20 years, and nobody noticed? anyways... this stuff was ported to sr3, and it likely uses the same logic.

petersilva commented 5 months ago

probably just doing splitline()[0:3] is enough to fix the problem in 99% of cases. A truly correct fix is harder.

reidsunderland commented 5 months ago

Maybe it is looking at all lines because of collections? I don't really know much about collections or how Sundew handles them.

petersilva commented 5 months ago

collections are always of the same type of bulletin. they are either all TAC (traditional alphanumeric code) or all binary.

The way to do collections with BUFR is just to catenate all the records together. so a collected BUFR would just start with one BUFR.