fiboa / data

A list of available fiboa datasets.
31 stars 5 forks source link

Create Brazil #25

Closed cholmes closed 3 months ago

cholmes commented 7 months ago

Use public data from Brazil and:

Instructions available at https://github.com/fiboa/data/blob/main/HOWTO.md

cholmes commented 6 months ago

Dataset at https://data.mendeley.com/datasets/vz6d7tw87f/1#file-5ac1542b-12ef-4dce-8258-113b5c5d87c9 and attached: LEM_dataset.zip CC-BY license

Note that this is a very small subset of brazil. It will likely be the exact same as the fieldscapes data - ie not a subset. We should figure out what to do with data that is already subsetted, when fieldscapes is using the whole thing.

Barasakar commented 4 months ago

Hi, I am a researcher from WashU, and I would like to work on the Brazil dataset. Could someone please add me to the assignees? Thank you!

Barasakar commented 4 months ago

Hi everyone, I am slightly confused by the dataset here. So currently I am filling in COLUMNS variable for my converter:

Screenshot 2024-07-24 at 11 49 51 AM

However when I look into the dataset's column, it seems like they are different dates:

Screenshot 2024-07-24 at 11 44 45 AM

This is quite different from the COLUMNS variable in other approved converters. In this case, should I create a bunch of fiboa custom fields that are based on months and years?

cholmes commented 4 months ago

Great to see you take this on @Barasakar! Note that there is a draft PR of a converter for this at https://github.com/fiboa/cli/pull/49/files#r1641928561 (most of the datasets here do have drafts from fieldscapes - but I've not had the time to link them up).

This one is pretty simple and it's good to learn from scratch on one, but I think for many of the others you can start with the draft PR and clone it to your repo and then make a new PR. I think many need to renamed, and most all need the link to the actual source data.

I don't think we have a great answer for how to deal with these type of columns. @m-mohr and I talked to Bayer and got some ideas about 'time', but I think we need a much more thought out model to have a good mapping for these.

In the meantime I think the straightforward thing that's most 'true' to the source is to just create a bunch of columns of the same names as the source data. For the first pass you can just add them to 'columns' and then use the MISSING_SCHEMAS section of the template to define them each - don't need to like define a full fiboa extension or anything. In time if we get a nice way to map crop type into time span then we can adapt it. But my take is the first converter should just be a clean conversion of all available source fields to fiboa where relevant and then just include the rest of the fields as the original dataset included them.

cholmes commented 4 months ago

See https://github.com/fiboa/cli/pull/83 for example of how I started with the existing PR / branch and then did more on top of it. I think when mine is out of draft we can just close the original one. Like I said Brazil seems pretty simple so I think you can just make a fresh PR, but many of the other draft PR's in there have some good processing / filtering already done.

Barasakar commented 3 months ago

Hi all @cholmes @m-mohr,

I have finished the converter thanks to your help. I am currently working on creating tests and planning to upload the data to Source Cooperative with the instructions.

For Testing at step 6: I am currently stuck at the ogr2ogr instruction. I decided to manually download the data with wget and the link, and locally unzipped the .zip file, which got me LEM_dataset.zip. However, if I need to create a subset of this dataset, I am not sure what command line I should run. I am also slightly confused about the update I need to make for tests/test_convert.py. Do I add my dataset name (in this case br_ba_lem) to this line @mark.parametrize('converter', ['at', 'be_vlg', 'de_sh', 'ec_lv', 'ec_si', 'fi', 'fr', 'nl', 'nl_crop', 'pt'])?

For uploading data to Source Cooperative: I have registered an account and emailed hello@source.coop. Meanwhile, I am not sure what it means to "create a repository" at step 8.

Any clarification or suggestions will be appreciated! Thank you.

m-mohr commented 3 months ago

Sorry for the slow reply, @Barasakar.

For Testing at step 6

Yeah, for ZIP files it's a bit annoying. In this case, extract the ZIP file into a folder, e.g. LEM_dataset. Then run e.g. ogr2ogr LEM_dataset.shp -limit 100 LEM_dataset/LEM_dataset.shp This should create a new set of files that look similar to the original ones, but smaller in size. The four files you need to ZIP again into a LEM_dataset.zip. Then copy/move the zip file to tests/data-files/convert/br_ba_lem/LEM_dataset.zip.

Do I add my dataset name (in this case br_ba_lem) to this line ...

Indeed. As I had to try these steps myself for the instructions above, I already added the tests myself. But the same workflow should work similarly for your next datasets.

For uploading data to Source Cooperative:

I guess that's something for @PowerChell to answer...

Thanks.

Barasakar commented 3 months ago

I am facing some issues when following the instructions given here at step 16.

When I run: ogr2ogr -t_srs EPSG:4326 geo.json data/br_ba_lem.parquet.

It is showing me the following error:

(fiboa) jiayu.lin@crow:~/fiboa/cli/data$ ogr2ogr -t_srs EPSG:4326 geo.json br_ba_lem.parquet 
ERROR 1: Unable to open datasource `br_ba_lem.parquet' with the following drivers.
  -> `PCIDSK'
  -> `PDS4'
  -> `VICAR'
  -> `MBTiles'
  -> `EEDA'
  -> `OGCAPI'
  -> `ESRI Shapefile'
  -> `MapInfo File'
  -> `UK .NTF'
  -> `LVBAG'
  -> `OGR_SDTS'
  -> `S57'
  -> `DGN'
  -> `OGR_VRT'
  -> `Memory'
  -> `CSV'
  -> `NAS'
  -> `GML'
  -> `GPX'
  -> `LIBKML'
  -> `KML'
  -> `GeoJSON'
  -> `GeoJSONSeq'
  -> `ESRIJSON'
  -> `TopoJSON'
  -> `Interlis 1'
  -> `Interlis 2'
  -> `OGR_GMT'
  -> `GPKG'
  -> `SQLite'
  -> `WAsP'
  -> `OpenFileGDB'
  -> `DXF'
  -> `CAD'
  -> `FlatGeobuf'
  -> `Geoconcept'
  -> `GeoRSS'
  -> `VFK'
  -> `PGDUMP'
  -> `OSM'
  -> `GPSBabel'
  -> `OGR_PDS'
  -> `WFS'
  -> `OAPIF'
  -> `EDIGEO'
  -> `SVG'
  -> `Idrisi'
  -> `ODS'
  -> `XLSX'
  -> `Elasticsearch'
  -> `Carto'
  -> `AmigoCloud'
  -> `SXF'
  -> `Selafin'
  -> `JML'
  -> `PLSCENES'
  -> `CSW'
  -> `VDV'
  -> `GMLAS'
  -> `MVT'
  -> `NGW'
  -> `MapML'
  -> `GTFS'
  -> `PMTiles'
  -> `JSONFG'
  -> `MiraMonVector'
  -> `TIGER'
  -> `AVCBin'
  -> `AVCE00'
  -> `HTTP'

I made sure all the dependencies are downloaded such as gadal:

(fiboa) jiayu.lin@crow:~/fiboa/cli/data$ conda list
# packages in environment at /home/jiayu.lin/miniconda3/envs/fiboa:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
aiohappyeyeballs          2.3.4                    pypi_0    pypi
aiohttp                   3.10.0                   pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
arrow                     1.3.0                    pypi_0    pypi
attrs                     23.2.0                   pypi_0    pypi
aws-c-auth                0.7.25               h8a0f778_4    conda-forge
aws-c-cal                 0.7.2                h7970872_1    conda-forge
aws-c-common              0.9.25               h4bc722e_0    conda-forge
aws-c-compression         0.2.18               hc649ecc_8    conda-forge
aws-c-event-stream        0.4.2               h04a40c0_20    conda-forge
aws-c-http                0.8.7                hc9bb02b_2    conda-forge
aws-c-io                  0.14.18              h3e50d33_2    conda-forge
aws-c-mqtt                0.10.4              h674cf7e_16    conda-forge
aws-c-s3                  0.6.4                hbe604ca_6    conda-forge
aws-c-sdkutils            0.1.19               hc649ecc_0    conda-forge
aws-checksums             0.1.18               hc649ecc_8    conda-forge
aws-crt-cpp               0.27.5               hba11562_5    conda-forge
aws-sdk-cpp               1.11.379             he20dfa5_2    conda-forge
azure-core-cpp            1.13.0               h935415a_0    conda-forge
azure-identity-cpp        1.8.0                hd126650_2    conda-forge
azure-storage-blobs-cpp   12.12.0              hd2e3451_0    conda-forge
azure-storage-common-cpp  12.7.0               h10ac4d7_1    conda-forge
azure-storage-files-datalake-cpp 12.11.0              h325d260_1    conda-forge
blosc                     1.21.6               hef167b5_0    conda-forge
brotli                    1.1.0                    pypi_0    pypi
bzip2                     1.0.8                h4bc722e_7    conda-forge
c-ares                    1.32.3               h4bc722e_0    conda-forge
ca-certificates           2024.7.4             hbcca054_0    conda-forge
certifi                   2024.7.4                 pypi_0    pypi
charset-normalizer        3.3.2                    pypi_0    pypi
click                     8.1.7                    pypi_0    pypi
fiboa-cli                 0.6.0                     dev_0    <develop>
flatdict                  4.0.1                    pypi_0    pypi
fqdn                      1.5.1                    pypi_0    pypi
freexl                    2.0.0                h743c826_0    conda-forge
frozenlist                1.4.1                    pypi_0    pypi
fsspec                    2024.6.1                 pypi_0    pypi
gdal                      3.9.1           py312h7eda2e2_11    conda-forge
geopandas                 1.0.1                    pypi_0    pypi
geos                      3.12.2               he02047a_1    conda-forge
geotiff                   1.7.3                hf7fa9e8_2    conda-forge
gflags                    2.2.2             he1b5a44_1004    conda-forge
giflib                    5.2.2                hd590300_0    conda-forge
glog                      0.7.1                hbabe93e_0    conda-forge
icu                       75.1                 he02047a_0    conda-forge
idna                      3.7                      pypi_0    pypi
inflate64                 1.0.0                    pypi_0    pypi
iniconfig                 2.0.0                    pypi_0    pypi
isoduration               20.11.0                  pypi_0    pypi
json-c                    0.17                 h1220068_1    conda-forge
jsonpointer               3.0.0                    pypi_0    pypi
jsonschema                4.23.0                   pypi_0    pypi
jsonschema-specifications 2023.12.1                pypi_0    pypi
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.3               h659f571_0    conda-forge
ld_impl_linux-64          2.40                 hf3520f5_7    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libabseil                 20240116.2      cxx17_he02047a_1    conda-forge
libarchive                3.7.4                hfca40fe_0    conda-forge
libarrow                  17.0.0           h03aeac6_7_cpu    conda-forge
libarrow-acero            17.0.0           he02047a_7_cpu    conda-forge
libarrow-dataset          17.0.0           he02047a_7_cpu    conda-forge
libarrow-substrait        17.0.0           hc9a23c6_7_cpu    conda-forge
libblas                   3.9.0           23_linux64_openblas    conda-forge
libbrotlicommon           1.1.0                hd590300_1    conda-forge
libbrotlidec              1.1.0                hd590300_1    conda-forge
libbrotlienc              1.1.0                hd590300_1    conda-forge
libcblas                  3.9.0           23_linux64_openblas    conda-forge
libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
libcurl                   8.9.1                hdb1bdb2_0    conda-forge
libdeflate                1.20                 hd590300_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libevent                  2.1.12               hf998b51_1    conda-forge
libexpat                  2.6.2                h59595ed_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 14.1.0               h77fa898_0    conda-forge
libgdal-core              3.9.1               h8f9377d_10    conda-forge
libgfortran-ng            14.1.0               h69a702a_0    conda-forge
libgfortran5              14.1.0               hc5f4f2c_0    conda-forge
libgomp                   14.1.0               h77fa898_0    conda-forge
libgoogle-cloud           2.28.0               h26d7fe4_0    conda-forge
libgoogle-cloud-storage   2.28.0               ha262f82_0    conda-forge
libgrpc                   1.62.2               h15f2491_0    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
libjpeg-turbo             3.0.0                hd590300_1    conda-forge
libkml                    1.3.0             hbbc8833_1020    conda-forge
liblapack                 3.9.0           23_linux64_openblas    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libopenblas               0.3.27          pthreads_hac2b453_1    conda-forge
libparquet                17.0.0           haa1307c_7_cpu    conda-forge
libpng                    1.6.43               h2797004_0    conda-forge
libprotobuf               4.25.3               h08a7969_0    conda-forge
libre2-11                 2023.09.01           h5a48ba9_2    conda-forge
librttopo                 1.1.0               hc670b87_16    conda-forge
libspatialite             5.1.0                h15fa968_9    conda-forge
libsqlite                 3.46.0               hde9e2c9_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              14.1.0               hc0a3c3a_0    conda-forge
libthrift                 0.20.0               hb90f79a_0    conda-forge
libtiff                   4.6.0                h1dd3fc0_3    conda-forge
libutf8proc               2.8.0                h166bdaf_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libwebp-base              1.4.0                hd590300_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxml2                   2.12.7               he7c6b58_4    conda-forge
libzlib                   1.3.1                h4ab18f5_1    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
lzo                       2.10              hd590300_1001    conda-forge
minizip                   4.0.7                h401b404_0    conda-forge
multidict                 6.0.5                    pypi_0    pypi
multivolumefile           0.2.3                    pypi_0    pypi
ncurses                   6.5                  h59595ed_0    conda-forge
numpy                     2.0.1           py312h1103770_0    conda-forge
openssl                   3.3.1                h4bc722e_2    conda-forge
orc                       2.0.1                h17fec99_1    conda-forge
packaging                 24.1                     pypi_0    pypi
pandas                    2.2.2                    pypi_0    pypi
pcre2                     10.44                h0f59acf_0    conda-forge
pip                       24.2               pyhd8ed1ab_0    conda-forge
pluggy                    1.5.0                    pypi_0    pypi
proj                      9.4.1                h54d7996_1    conda-forge
psutil                    6.0.0                    pypi_0    pypi
py7zr                     0.21.1                   pypi_0    pypi
pyarrow                   17.0.0                   pypi_0    pypi
pyarrow-core              17.0.0          py312h9cafe31_1_cpu    conda-forge
pybcj                     1.0.2                    pypi_0    pypi
pycryptodomex             3.20.0                   pypi_0    pypi
pyogrio                   0.9.0                    pypi_0    pypi
pyppmd                    1.1.0                    pypi_0    pypi
pyproj                    3.6.1                    pypi_0    pypi
pytest                    8.3.2                    pypi_0    pypi
python                    3.12.4          h194c7f8_0_cpython    conda-forge
python-dateutil           2.9.0.post0              pypi_0    pypi
python_abi                3.12                    4_cp312    conda-forge
pytz                      2024.1                   pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
pyzstd                    0.16.0                   pypi_0    pypi
re2                       2023.09.01           h7f4b329_2    conda-forge
readline                  8.2                  h8228510_1    conda-forge
referencing               0.35.1                   pypi_0    pypi
requests                  2.32.3                   pypi_0    pypi
rfc3339-validator         0.1.4                    pypi_0    pypi
rfc3987                   1.3.8                    pypi_0    pypi
rpds-py                   0.19.1                   pypi_0    pypi
s2n                       1.5.0                h3400bea_0    conda-forge
setuptools                72.1.0             pyhd8ed1ab_0    conda-forge
shapely                   2.0.5                    pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
snappy                    1.2.1                ha2e4443_0    conda-forge
sqlite                    3.46.0               h6d4b2fc_0    conda-forge
texttable                 1.7.0                    pypi_0    pypi
tippecanoe                2.31.0               ha331528_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
types-python-dateutil     2.9.0.20240316           pypi_0    pypi
tzdata                    2024.1                   pypi_0    pypi
uri-template              1.3.0                    pypi_0    pypi
uriparser                 0.9.8                hac33072_0    conda-forge
urllib3                   2.2.2                    pypi_0    pypi
webcolors                 24.6.0                   pypi_0    pypi
wheel                     0.43.0             pyhd8ed1ab_1    conda-forge
xerces-c                  3.2.5                h666cd97_1    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yarl                      1.9.4                    pypi_0    pypi
zlib                      1.3.1                h4ab18f5_1    conda-forge
zstd                      1.5.6                ha6fb4c9_0    conda-forge

I suspect there might be something wrong with my gdal? Please let me know if you need additional information. I apologize for the long message, and thank you!

m-mohr commented 3 months ago

Your GDAL doesn't have the Parquet driver. Which GDAL version are you using? @Barasakar You need 3.8 or later afaik.

Barasakar commented 3 months ago

My GDAL version is 3.9.1. I also installed pyarrow in hope that will solve the issue, but it didn't.

m-mohr commented 3 months ago

@Barasakar I'm not really GDAL expert, sorry. If you want, just upload the data without PMTiles and I'll generate and upload the PMTiles for you.

Barasakar commented 3 months ago

I fixed it.

I guess simply downloading GDAL isn't enough to run the command ogr2ogr -t_srs EPSG:4326 geo.json data/xx-yy.parquet because the original GDAL (installed by pip install gdal) doesn't provide parquet format read support.

I was able to fix it by downloading the following package: conda install -c conda-forge libgdal-arrow-parquet

Barasakar commented 3 months ago

Thank you for the long wait and the help during the process; I just published the data to Source Cooperative: https://beta.source.coop/repositories/fiboa/br-ba-lem/

I am not sure the best way to present the COLUMN section in the README.md, so when you have time, please let me know if I need to edit anything.

The data is currently marked as unlisted on Source Cooperative. Please feel free to mark it as listed if everything is good.

m-mohr commented 3 months ago

Thanks, well done. I've listed the repository and added it to our central list of datasets, you can now also find it on https://fiboa.org/map/