Sinar / ocds-scripts

Scripts to convert Malaysian contract data/info into OCDS standard release and records
Creative Commons Attribution Share Alike 4.0 International
0 stars 0 forks source link

Convert MyProcurement Tender Award into OCDS JSON #6

Open kaerumy opened 7 years ago

kaerumy commented 7 years ago

Data: https://drive.google.com/open?id=0B4Iaflcl7wP0M2c3dHVlMkdRbEE

As with CIDB, JKR. Test some samples against OCDS validator and file issues/comments from result.

kmubiin commented 7 years ago

Hm, MyProcurement data seem to have all fields complete.

Get list of JSONL files
Found files: 1
Read MyProcurement data from JSONL files
Read from keputusan_tender_arkib_new.jsonl
Empty "id": 0
Empty "title": 0
Empty "tender_number": 0
Empty "ministry": 0
Empty "agency": 0
Empty "successful_tenderer": 0
Empty "agreed_price": 0
Total lines checked: 15006

Despite complete, there are bad entries in MyProcurement data.

  1. Need to remove period (.) from "id" field

    "id": "101."

  2. Need to remove currency (RM) and comma separators (,) from amount of value

    "agreed_price": "RM2,357,443.40"

  3. Need to separate strings in "sucessful_tenderer" field

    "successful_tenderer": "SEDIAKAWAL (M) SDN. BHD.\n[NO. DAFTAR SYARIKAT: 105773-W]\n[NO. DAFTAR MOF/PKK: (NULL)]"

Anything else to consider besides above? Probably that is all.

Update 2017.09.25: Just parse bad entries "as it is" for item 3. The rest are straightforward and doable i.e. removing invalid characters shall be done by the script.

kaerumy commented 7 years ago

My recommendation is to convert it into OCDS format, bad data and all first.

kmubiin commented 7 years ago

See commit b353e43 for FIXME comment to see parts that cause validation errors at this point.

Validation errors for schema version 1.0 and 1.1:

  • tender:id is missing but required

Both schema versions have same validation errors for OCDS-MyProcurement.