Closed kmubiin closed 2 years ago
For both existing firm and director
{
"statementGroups": [
{
"beneficialOwnershipStatements": [
{
"entity": {
"addresses": [
{
"address": "BLOK B-7-20, PPR KG. BARU AIR PANAS, JALAN USAHAWAN 6, KUALA LUMPUR, WILAYAH PERSEKUTUAN",
"country": "MY",
"postCode": "53200",
"type": "residence"
},
{
"address": "BLOK B-7-20, PPR KG. BARU AIR PANAS, JALAN USAHAWAN 6, KUALA LUMPUR, WILAYAH PERSEKUTUAN",
"country": "MY",
"postCode": "53200",
"type": "registered"
}
],
"foundingDate": "",
"id": "d4da3ada2883490d9c8aecc07d3bd289",
"identifiers": [
{
"id": "0120001010-WP060310",
"schema": "CIDB-registered"
}
],
"jurisdiction": "MY",
"name": "RENONGAN EMAS ENTERPRISE",
"statementDate": "2017-10-03",
"type": "registeredEntity"
},
"id": "7c9f62bc919646219a52ed1543399a5b",
"interestedParty": {
"id": "0c045fb351db4f7bba18a6b810c98dbc",
"identifiers": [
{
"id": "4ad2fb1abdc847e5894d3332914fb65a",
"schema": "UUID-HEX"
}
],
"name": "Joint shareholding",
"statementDate": "2017-10-03",
"type": "arrangement"
},
"interests": [
{
"interestLevel": "direct",
"share": {
"exact": 100
},
"type": "shareholding"
}
],
"statementDate": "2017-10-03"
},
{
"entity": {
"foundingDate": "",
"id": "9da1f9d9006c4e778013d3050eadf5ee",
"identifiers": [
{
"id": "4ad2fb1abdc847e5894d3332914fb65a",
"schema": "UUID-HEX"
}
],
"jurisdiction": "MY",
"name": "Joint shareholding",
"statementDate": "2017-10-03",
"type": "arrangement"
},
"id": "14d164627c2d4e778c820bad9471a7fb",
"interestedParty": {
"id": "f0ef4de0a0db499cb4c5886242e620f2",
"identifiers": [
{
"id": "MYS-IDCARD-670802025959",
"schema": "id-card"
}
],
"name": "ROSLI BIN AHMAD",
"nationalities": [
"MY"
],
"statementDate": "2017-10-03",
"type": "naturalPerson"
},
"interests": [
{
"interestLevel": "direct",
"share": {
"exact": 10
},
"type": "shareholding"
}
],
"statementDate": "2017-10-03"
},
{
"entity": {
"foundingDate": "",
"id": "1f088778816c431499017f433b1d31bb",
"identifiers": [
{
"id": "4ad2fb1abdc847e5894d3332914fb65a",
"schema": "UUID-HEX"
}
],
"jurisdiction": "MY",
"name": "Joint shareholding",
"statementDate": "2017-10-03",
"type": "arrangement"
},
"id": "cc3bcc43f8a2405fabb95d817ffc4f23",
"interestedParty": {
"id": "65099a60e561434db5f11d6ac0dc0005",
"identifiers": [
{
"id": "MYS-IDCARD-610907025483",
"schema": "id-card"
}
],
"name": "ABDUL GHANI BIN ISA RULHAK KHAN",
"nationalities": [
"MY"
],
"statementDate": "2017-10-03",
"type": "naturalPerson"
},
"interests": [
{
"interestLevel": "direct",
"share": {
"exact": 90
},
"type": "shareholding"
}
],
"statementDate": "2017-10-03"
}
],
"id": "7ac9d1b8b1bb4470a2ecfbcbfd9585c8-meta-70175"
}
]
}
From line 1
of ./data/bods-contractors201509220.jsonl
Remarks
Pasted new output at above. Except for "id" instances that are newly generated at runtime, there is no changes for this case; This also indicates that the pushed commits introduce the changes correctly.
Changes should be seen only in case of "for existing firm but empty director" and "for both empty firm and director (bad data)". See updated comments at below.
For existing firm but empty director
{
"statementGroups": [
{
"beneficialOwnershipStatements": [
{
"entity": {
"addresses": [
{
"address": "PETI SURAT 36, MEMBAKUT, SABAH",
"country": "MY",
"postCode": "89727",
"type": "residence"
},
{
"address": "PETI SURAT 36, MEMBAKUT, SABAH",
"country": "MY",
"postCode": "89727",
"type": "registered"
}
],
"foundingDate": "",
"id": "5c67cd53b0ae4614a791ef217bd28024",
"identifiers": [
{
"id": "0120021209-SB078274",
"schema": "CIDB-registered"
}
],
"jurisdiction": "MY",
"name": "B & J ENTERPRISE",
"statementDate": "2017-10-03",
"type": "registeredEntity"
},
"id": "852e3f93595a43b28d4e3f66c78daf80",
"interestedParty": {
"description": "no beneficial owner in source",
"type": "unknown"
},
"interests": [],
"statementDate": "2017-10-03"
}
],
"id": "538c9515727748b7b3304af3d4fb7c67-meta-89375"
}
]
}
From line 7
of ./data/bods-contractors201509220.jsonl
Remarks
Current implementation may be wrong. When beneficial owner is not found, there should be only one statement instead of two (now fixed and pasted new output at above).
For both empty firm and director
{
"statementGroups": [
{
"beneficialOwnershipStatements": [
{
"entity": {
"addresses": [],
"foundingDate": "",
"id": "423e9f12e0fa4ee89a46aa83d684f59a",
"identifiers": [
{
"id": "",
"schema": ""
}
],
"jurisdiction": "MY",
"name": "Joint shareholding",
"statementDate": "2017-10-03",
"type": "unknownEntity"
},
"id": "b12c42c3c1c042f6af702ce8fccda50b",
"interestedParty": {
"description": "no beneficial owner in source",
"type": "unknown"
},
"interests": [],
"statementDate": "2017-10-03"
}
],
"id": "811116e767394728bcb780832578eab4-meta-176063"
}
]
}
From line 83
of ./data/bods-contractors201509220.jsonl
Remarks
This is an example of result that likely not required to validate and probably safe to ignore, since the source had this kind of invalid entries found between other entries (updated new output at above).
Invalid entry for the same line in source:
{
"Alamat Berdaftar seperti Didalam Sijil SSM": {
"Alamat": "",
"Alamat 1": "",
"Alamat 2": "",
"Bandar": "",
"Emel": "",
"Fax": "",
"Negeri": "",
"Poskod": "",
"Telefon": ""
},
"Alamat Surat Menyurat": {
"Alamat": "",
"Alamat 1": "",
"Alamat 2": "",
"Bandar": "",
"Emel": "",
"Fax": "",
"Negeri": "",
"Poskod": "",
"Telefon": ""
},
"Profil": {
"Gred Kontraktor": "",
"Jenis Syarikat": "",
"Lain-lain Lesen": "-",
"Lesen Perdagangan": "-",
"Lesen Perniagaan": "-",
"Nombor Pendaftaran": "-",
"Nombor Pendaftaran Lain": "-",
"ROB": "-",
"ROC": "-",
"Tarikh Luput Sijil Pendaftaran CIDB": "-"
},
"directors": [],
"meta": {
"id": "176063",
"status": ""
},
"name": "",
"projects": []
}
In other words, there is nothing to validate in this example of bad data, unlike previous example that has "existing firm but empty director".
First self-validation
According to null statement example on GitHub repo, when no beneficial owner is located, there is no statement at all for person and make use of NullParty component in BODS docs in the only statement.
In comparison, the current implementation by script still generate statement for person and make use of NullParty component in statement for person (instead of NullParty component in statement for firm).
So existing script might be wrong. In other words, there should be only one statement instead of two when beneficial owner is not found.
I will fix this soon.
P.S.: Reviewed fix in commit that is referenced below this comment. Actual fix was done in stages as several commits before the referenced one.
This issue is now marked as "invalid" and can be closed by the original author.
BODS format has had some changes, starting from 0.1 release in 2019 (archived). Similar to OCDS counterpart, BODS data review tool has been available from 2020 (archived). Because of the changes made from 0.1 release, whatever written based on the 0.1 draft is deprecated and now deemed invalid. As such, the converted BODS-CIDB data from four years ago will fail the validation as expected.
Whether the script/data should be rewritten/updated by future contributors or not, is a separate matter from this issue. As of today, no need to do manual validation and use the online tool instead.
ORIGINAL ISSUE (2017)
This follows after issue #1 that converts CIDB data to BODS format.
Unlike Open Contracting Data Standard (OCDS) that has stable version of schema and dedicated web site for validation, Benefitial Ownership Data Standard (BODS) has neither of those to this posted date.
Therefore, need more eyes to validate Malaysian data i.e. BODS-CIDB data.
How to validate
The only way to validate BODS at the moment is by following these steps:
Firefox has a "Filter JSON" search box at upper-right corner, when viewing a JSON file. This is useful to quickly check particular string of ID that is supposedly shared between multiple statements.
Similar checking could be done by
grep
and other command line tools. But the easiest way is to open JSON in Firefox and check in pretty print or collapsible objects layout.Known bad data
Several identified issues for Malaysia data i.e. BODS-CIDB data:
Inconsistent ID for entity
105
in./data/bods-contractors201509220.jsonl
Inconsistent "nationalities" for person
35
in./data/bods-contractors201509220.jsonl
Inconsistent "share" for person
2
in./data/bods-contractors201509220.jsonl
Inconsistent "address" for entity
8
in./data/bods-contractors201509220.jsonl
Noted workaround above are implemented in the script already.
Known bad data at worst
Possible typo in "name" for entity
908
in./data/bods-contractors201509220.jsonl
There is no workaround for bad data at worst, because source data itself is bad.
Known empty fields
Empty "foundingDate" for entity
""
.Empty "interests" for entity
[]
when beneficial owner is not found. Else when beneficial owner is found, "interests" should always contain non-empty list.Empty nested fields in "addresses" for entity
""
for empty strings in source.445
in./data/bods-contractors201509220.jsonl
Example JSON
Few examples of result data will be pasted as JSON in separate comments at below.
There are three kinds of result data:
The third kind (bad data) is likely not required to validate but pasted anyway for example.
Update 2017.10.02 Separate section for "Known bad data" and "Known empty fields" because these are two different things.
Update 2017.10.03 Review this issue, fix typo, add text, updated comments for new output. Add description and example data for each known things.