Open kaerumy opened 7 years ago
Hm, MyProcurement data seem to have all fields complete.
Get list of JSONL files
Found files: 1
Read MyProcurement data from JSONL files
Read from keputusan_tender_arkib_new.jsonl
Empty "id": 0
Empty "title": 0
Empty "tender_number": 0
Empty "ministry": 0
Empty "agency": 0
Empty "successful_tenderer": 0
Empty "agreed_price": 0
Total lines checked: 15006
Despite complete, there are bad entries in MyProcurement data.
Need to remove period (.) from "id" field
"id": "101."
Need to remove currency (RM) and comma separators (,) from amount of value
"agreed_price": "RM2,357,443.40"
Need to separate strings in "sucessful_tenderer" field
"successful_tenderer": "SEDIAKAWAL (M) SDN. BHD.\n[NO. DAFTAR SYARIKAT: 105773-W]\n[NO. DAFTAR MOF/PKK: (NULL)]"
Anything else to consider besides above? Probably that is all.
Update 2017.09.25: Just parse bad entries "as it is" for item 3. The rest are straightforward and doable i.e. removing invalid characters shall be done by the script.
My recommendation is to convert it into OCDS format, bad data and all first.
See commit b353e43 for FIXME comment to see parts that cause validation errors at this point.
Validation errors for schema version 1.0 and 1.1:
- tender:id is missing but required
Both schema versions have same validation errors for OCDS-MyProcurement.
Data: https://drive.google.com/open?id=0B4Iaflcl7wP0M2c3dHVlMkdRbEE
As with CIDB, JKR. Test some samples against OCDS validator and file issues/comments from result.