Restructure merged_output.json

Very rough draft of a potential JSON schema we could use for datasets. Note that the file records are nested as a property within the JSON object rather than having multiple JSON objects for the same dataset but different files.

{
    "type": "object",
    "properties": {
        "title": {
            "type": "string"
        },
        "owner": {
            "type": "string"
        },
        "pageURL": {
            "type": "string"
        },
        "dateCreated": {
            "type": "string"
        },
        "dateUpdated": {
            "type": "string"
        },
        "license": {
            "type": "string"
        },
        "description": {
            "type": "string"
        },
        "tags": {
            "type": "array",
            "description": "Could make an array of objects with specifier for tags from original dataset, ones manually added and ones added by the pipeline",
            "items": {
                "type": "string"
            }
        },
        "resources": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "fileName": {
                        "type": "string"
                    },
                    "fileSize": {
                        "type": "string"
                    },
                    "fileSizeUnit": {
                        "type": "string",
                        "description": "Could we do away with this prop and just enforce file sizes to be bytes?"
                    },
                    "fileType": {
                        "type": "string"
                    },
                    "assetUrl": {
                        "type": "string"
                    },
                    "dateCreated": {
                        "type": "string"
                    },
                    "dateUpdated": {
                        "type": "string"
                    },
                    "numRecords": {
                        "type": "number"
                    }
                },
                "required": [
                    "fileName",
                    "fileType",
                    "assetUrl"
                ]
            }
        }
    },
    "required": [
        "title",
        "owner",
        "pageURL",
        "dateCreated"
    ]
}

_Originally posted by @JackGilmore in https://github.com/OpenDataScotland/the_od_bods/issues/163#issuecomment-1268595248_

OpenDataScotland / the_od_bods

Restructure merged_output.json #226