Closed cholmes closed 3 months ago
Dataset at https://data.mendeley.com/datasets/vz6d7tw87f/1#file-5ac1542b-12ef-4dce-8258-113b5c5d87c9 and attached: LEM_dataset.zip CC-BY license
Note that this is a very small subset of brazil. It will likely be the exact same as the fieldscapes data - ie not a subset. We should figure out what to do with data that is already subsetted, when fieldscapes is using the whole thing.
Hi, I am a researcher from WashU, and I would like to work on the Brazil dataset. Could someone please add me to the assignees? Thank you!
Hi everyone, I am slightly confused by the dataset here. So currently I am filling in COLUMNS
variable for my converter:
However when I look into the dataset's column, it seems like they are different dates:
This is quite different from the COLUMNS
variable in other approved converters. In this case, should I create a bunch of fiboa custom fields that are based on months and years?
Great to see you take this on @Barasakar! Note that there is a draft PR of a converter for this at https://github.com/fiboa/cli/pull/49/files#r1641928561 (most of the datasets here do have drafts from fieldscapes - but I've not had the time to link them up).
This one is pretty simple and it's good to learn from scratch on one, but I think for many of the others you can start with the draft PR and clone it to your repo and then make a new PR. I think many need to renamed, and most all need the link to the actual source data.
I don't think we have a great answer for how to deal with these type of columns. @m-mohr and I talked to Bayer and got some ideas about 'time', but I think we need a much more thought out model to have a good mapping for these.
In the meantime I think the straightforward thing that's most 'true' to the source is to just create a bunch of columns of the same names as the source data. For the first pass you can just add them to 'columns' and then use the MISSING_SCHEMAS section of the template to define them each - don't need to like define a full fiboa extension or anything. In time if we get a nice way to map crop type into time span then we can adapt it. But my take is the first converter should just be a clean conversion of all available source fields to fiboa where relevant and then just include the rest of the fields as the original dataset included them.
See https://github.com/fiboa/cli/pull/83 for example of how I started with the existing PR / branch and then did more on top of it. I think when mine is out of draft we can just close the original one. Like I said Brazil seems pretty simple so I think you can just make a fresh PR, but many of the other draft PR's in there have some good processing / filtering already done.
Hi all @cholmes @m-mohr,
I have finished the converter thanks to your help. I am currently working on creating tests and planning to upload the data to Source Cooperative with the instructions.
For Testing at step 6:
I am currently stuck at the ogr2ogr
instruction. I decided to manually download the data with wget
and the link, and locally unzipped the .zip
file, which got me LEM_dataset.zip
. However, if I need to create a subset of this dataset, I am not sure what command line I should run. I am also slightly confused about the update I need to make for tests/test_convert.py
. Do I add my dataset name (in this case br_ba_lem
) to this line @mark.parametrize('converter', ['at', 'be_vlg', 'de_sh', 'ec_lv', 'ec_si', 'fi', 'fr', 'nl', 'nl_crop', 'pt'])
?
For uploading data to Source Cooperative:
I have registered an account and emailed hello@source.coop
. Meanwhile, I am not sure what it means to "create a repository" at step 8.
Any clarification or suggestions will be appreciated! Thank you.
Sorry for the slow reply, @Barasakar.
For Testing at step 6
Yeah, for ZIP files it's a bit annoying.
In this case, extract the ZIP file into a folder, e.g. LEM_dataset.
Then run e.g. ogr2ogr LEM_dataset.shp -limit 100 LEM_dataset/LEM_dataset.shp
This should create a new set of files that look similar to the original ones, but smaller in size.
The four files you need to ZIP again into a LEM_dataset.zip.
Then copy/move the zip file to tests/data-files/convert/br_ba_lem/LEM_dataset.zip.
Do I add my dataset name (in this case br_ba_lem) to this line ...
Indeed. As I had to try these steps myself for the instructions above, I already added the tests myself. But the same workflow should work similarly for your next datasets.
For uploading data to Source Cooperative:
I guess that's something for @PowerChell to answer...
Thanks.
I am facing some issues when following the instructions given here at step 16.
When I run:
ogr2ogr -t_srs EPSG:4326 geo.json data/br_ba_lem.parquet
.
It is showing me the following error:
(fiboa) jiayu.lin@crow:~/fiboa/cli/data$ ogr2ogr -t_srs EPSG:4326 geo.json br_ba_lem.parquet
ERROR 1: Unable to open datasource `br_ba_lem.parquet' with the following drivers.
-> `PCIDSK'
-> `PDS4'
-> `VICAR'
-> `MBTiles'
-> `EEDA'
-> `OGCAPI'
-> `ESRI Shapefile'
-> `MapInfo File'
-> `UK .NTF'
-> `LVBAG'
-> `OGR_SDTS'
-> `S57'
-> `DGN'
-> `OGR_VRT'
-> `Memory'
-> `CSV'
-> `NAS'
-> `GML'
-> `GPX'
-> `LIBKML'
-> `KML'
-> `GeoJSON'
-> `GeoJSONSeq'
-> `ESRIJSON'
-> `TopoJSON'
-> `Interlis 1'
-> `Interlis 2'
-> `OGR_GMT'
-> `GPKG'
-> `SQLite'
-> `WAsP'
-> `OpenFileGDB'
-> `DXF'
-> `CAD'
-> `FlatGeobuf'
-> `Geoconcept'
-> `GeoRSS'
-> `VFK'
-> `PGDUMP'
-> `OSM'
-> `GPSBabel'
-> `OGR_PDS'
-> `WFS'
-> `OAPIF'
-> `EDIGEO'
-> `SVG'
-> `Idrisi'
-> `ODS'
-> `XLSX'
-> `Elasticsearch'
-> `Carto'
-> `AmigoCloud'
-> `SXF'
-> `Selafin'
-> `JML'
-> `PLSCENES'
-> `CSW'
-> `VDV'
-> `GMLAS'
-> `MVT'
-> `NGW'
-> `MapML'
-> `GTFS'
-> `PMTiles'
-> `JSONFG'
-> `MiraMonVector'
-> `TIGER'
-> `AVCBin'
-> `AVCE00'
-> `HTTP'
I made sure all the dependencies are downloaded such as gadal
:
(fiboa) jiayu.lin@crow:~/fiboa/cli/data$ conda list
# packages in environment at /home/jiayu.lin/miniconda3/envs/fiboa:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
aiohappyeyeballs 2.3.4 pypi_0 pypi
aiohttp 3.10.0 pypi_0 pypi
aiosignal 1.3.1 pypi_0 pypi
arrow 1.3.0 pypi_0 pypi
attrs 23.2.0 pypi_0 pypi
aws-c-auth 0.7.25 h8a0f778_4 conda-forge
aws-c-cal 0.7.2 h7970872_1 conda-forge
aws-c-common 0.9.25 h4bc722e_0 conda-forge
aws-c-compression 0.2.18 hc649ecc_8 conda-forge
aws-c-event-stream 0.4.2 h04a40c0_20 conda-forge
aws-c-http 0.8.7 hc9bb02b_2 conda-forge
aws-c-io 0.14.18 h3e50d33_2 conda-forge
aws-c-mqtt 0.10.4 h674cf7e_16 conda-forge
aws-c-s3 0.6.4 hbe604ca_6 conda-forge
aws-c-sdkutils 0.1.19 hc649ecc_0 conda-forge
aws-checksums 0.1.18 hc649ecc_8 conda-forge
aws-crt-cpp 0.27.5 hba11562_5 conda-forge
aws-sdk-cpp 1.11.379 he20dfa5_2 conda-forge
azure-core-cpp 1.13.0 h935415a_0 conda-forge
azure-identity-cpp 1.8.0 hd126650_2 conda-forge
azure-storage-blobs-cpp 12.12.0 hd2e3451_0 conda-forge
azure-storage-common-cpp 12.7.0 h10ac4d7_1 conda-forge
azure-storage-files-datalake-cpp 12.11.0 h325d260_1 conda-forge
blosc 1.21.6 hef167b5_0 conda-forge
brotli 1.1.0 pypi_0 pypi
bzip2 1.0.8 h4bc722e_7 conda-forge
c-ares 1.32.3 h4bc722e_0 conda-forge
ca-certificates 2024.7.4 hbcca054_0 conda-forge
certifi 2024.7.4 pypi_0 pypi
charset-normalizer 3.3.2 pypi_0 pypi
click 8.1.7 pypi_0 pypi
fiboa-cli 0.6.0 dev_0 <develop>
flatdict 4.0.1 pypi_0 pypi
fqdn 1.5.1 pypi_0 pypi
freexl 2.0.0 h743c826_0 conda-forge
frozenlist 1.4.1 pypi_0 pypi
fsspec 2024.6.1 pypi_0 pypi
gdal 3.9.1 py312h7eda2e2_11 conda-forge
geopandas 1.0.1 pypi_0 pypi
geos 3.12.2 he02047a_1 conda-forge
geotiff 1.7.3 hf7fa9e8_2 conda-forge
gflags 2.2.2 he1b5a44_1004 conda-forge
giflib 5.2.2 hd590300_0 conda-forge
glog 0.7.1 hbabe93e_0 conda-forge
icu 75.1 he02047a_0 conda-forge
idna 3.7 pypi_0 pypi
inflate64 1.0.0 pypi_0 pypi
iniconfig 2.0.0 pypi_0 pypi
isoduration 20.11.0 pypi_0 pypi
json-c 0.17 h1220068_1 conda-forge
jsonpointer 3.0.0 pypi_0 pypi
jsonschema 4.23.0 pypi_0 pypi
jsonschema-specifications 2023.12.1 pypi_0 pypi
keyutils 1.6.1 h166bdaf_0 conda-forge
krb5 1.21.3 h659f571_0 conda-forge
ld_impl_linux-64 2.40 hf3520f5_7 conda-forge
lerc 4.0.0 h27087fc_0 conda-forge
libabseil 20240116.2 cxx17_he02047a_1 conda-forge
libarchive 3.7.4 hfca40fe_0 conda-forge
libarrow 17.0.0 h03aeac6_7_cpu conda-forge
libarrow-acero 17.0.0 he02047a_7_cpu conda-forge
libarrow-dataset 17.0.0 he02047a_7_cpu conda-forge
libarrow-substrait 17.0.0 hc9a23c6_7_cpu conda-forge
libblas 3.9.0 23_linux64_openblas conda-forge
libbrotlicommon 1.1.0 hd590300_1 conda-forge
libbrotlidec 1.1.0 hd590300_1 conda-forge
libbrotlienc 1.1.0 hd590300_1 conda-forge
libcblas 3.9.0 23_linux64_openblas conda-forge
libcrc32c 1.1.2 h9c3ff4c_0 conda-forge
libcurl 8.9.1 hdb1bdb2_0 conda-forge
libdeflate 1.20 hd590300_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 hd590300_2 conda-forge
libevent 2.1.12 hf998b51_1 conda-forge
libexpat 2.6.2 h59595ed_0 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 14.1.0 h77fa898_0 conda-forge
libgdal-core 3.9.1 h8f9377d_10 conda-forge
libgfortran-ng 14.1.0 h69a702a_0 conda-forge
libgfortran5 14.1.0 hc5f4f2c_0 conda-forge
libgomp 14.1.0 h77fa898_0 conda-forge
libgoogle-cloud 2.28.0 h26d7fe4_0 conda-forge
libgoogle-cloud-storage 2.28.0 ha262f82_0 conda-forge
libgrpc 1.62.2 h15f2491_0 conda-forge
libiconv 1.17 hd590300_2 conda-forge
libjpeg-turbo 3.0.0 hd590300_1 conda-forge
libkml 1.3.0 hbbc8833_1020 conda-forge
liblapack 3.9.0 23_linux64_openblas conda-forge
libnghttp2 1.58.0 h47da74e_1 conda-forge
libnsl 2.0.1 hd590300_0 conda-forge
libopenblas 0.3.27 pthreads_hac2b453_1 conda-forge
libparquet 17.0.0 haa1307c_7_cpu conda-forge
libpng 1.6.43 h2797004_0 conda-forge
libprotobuf 4.25.3 h08a7969_0 conda-forge
libre2-11 2023.09.01 h5a48ba9_2 conda-forge
librttopo 1.1.0 hc670b87_16 conda-forge
libspatialite 5.1.0 h15fa968_9 conda-forge
libsqlite 3.46.0 hde9e2c9_0 conda-forge
libssh2 1.11.0 h0841786_0 conda-forge
libstdcxx-ng 14.1.0 hc0a3c3a_0 conda-forge
libthrift 0.20.0 hb90f79a_0 conda-forge
libtiff 4.6.0 h1dd3fc0_3 conda-forge
libutf8proc 2.8.0 h166bdaf_0 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libwebp-base 1.4.0 hd590300_0 conda-forge
libxcrypt 4.4.36 hd590300_1 conda-forge
libxml2 2.12.7 he7c6b58_4 conda-forge
libzlib 1.3.1 h4ab18f5_1 conda-forge
lz4-c 1.9.4 hcb278e6_0 conda-forge
lzo 2.10 hd590300_1001 conda-forge
minizip 4.0.7 h401b404_0 conda-forge
multidict 6.0.5 pypi_0 pypi
multivolumefile 0.2.3 pypi_0 pypi
ncurses 6.5 h59595ed_0 conda-forge
numpy 2.0.1 py312h1103770_0 conda-forge
openssl 3.3.1 h4bc722e_2 conda-forge
orc 2.0.1 h17fec99_1 conda-forge
packaging 24.1 pypi_0 pypi
pandas 2.2.2 pypi_0 pypi
pcre2 10.44 h0f59acf_0 conda-forge
pip 24.2 pyhd8ed1ab_0 conda-forge
pluggy 1.5.0 pypi_0 pypi
proj 9.4.1 h54d7996_1 conda-forge
psutil 6.0.0 pypi_0 pypi
py7zr 0.21.1 pypi_0 pypi
pyarrow 17.0.0 pypi_0 pypi
pyarrow-core 17.0.0 py312h9cafe31_1_cpu conda-forge
pybcj 1.0.2 pypi_0 pypi
pycryptodomex 3.20.0 pypi_0 pypi
pyogrio 0.9.0 pypi_0 pypi
pyppmd 1.1.0 pypi_0 pypi
pyproj 3.6.1 pypi_0 pypi
pytest 8.3.2 pypi_0 pypi
python 3.12.4 h194c7f8_0_cpython conda-forge
python-dateutil 2.9.0.post0 pypi_0 pypi
python_abi 3.12 4_cp312 conda-forge
pytz 2024.1 pypi_0 pypi
pyyaml 6.0.1 pypi_0 pypi
pyzstd 0.16.0 pypi_0 pypi
re2 2023.09.01 h7f4b329_2 conda-forge
readline 8.2 h8228510_1 conda-forge
referencing 0.35.1 pypi_0 pypi
requests 2.32.3 pypi_0 pypi
rfc3339-validator 0.1.4 pypi_0 pypi
rfc3987 1.3.8 pypi_0 pypi
rpds-py 0.19.1 pypi_0 pypi
s2n 1.5.0 h3400bea_0 conda-forge
setuptools 72.1.0 pyhd8ed1ab_0 conda-forge
shapely 2.0.5 pypi_0 pypi
six 1.16.0 pypi_0 pypi
snappy 1.2.1 ha2e4443_0 conda-forge
sqlite 3.46.0 h6d4b2fc_0 conda-forge
texttable 1.7.0 pypi_0 pypi
tippecanoe 2.31.0 ha331528_0 conda-forge
tk 8.6.13 noxft_h4845f30_101 conda-forge
types-python-dateutil 2.9.0.20240316 pypi_0 pypi
tzdata 2024.1 pypi_0 pypi
uri-template 1.3.0 pypi_0 pypi
uriparser 0.9.8 hac33072_0 conda-forge
urllib3 2.2.2 pypi_0 pypi
webcolors 24.6.0 pypi_0 pypi
wheel 0.43.0 pyhd8ed1ab_1 conda-forge
xerces-c 3.2.5 h666cd97_1 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
yarl 1.9.4 pypi_0 pypi
zlib 1.3.1 h4ab18f5_1 conda-forge
zstd 1.5.6 ha6fb4c9_0 conda-forge
I suspect there might be something wrong with my gdal
? Please let me know if you need additional information. I apologize for the long message, and thank you!
Your GDAL doesn't have the Parquet driver. Which GDAL version are you using? @Barasakar You need 3.8 or later afaik.
My GDAL version is 3.9.1. I also installed pyarrow in hope that will solve the issue, but it didn't.
@Barasakar I'm not really GDAL expert, sorry. If you want, just upload the data without PMTiles and I'll generate and upload the PMTiles for you.
I fixed it.
I guess simply downloading GDAL isn't enough to run the command ogr2ogr -t_srs EPSG:4326 geo.json data/xx-yy.parquet
because the original GDAL (installed by pip install gdal
) doesn't provide parquet format read support.
I was able to fix it by downloading the following package:
conda install -c conda-forge libgdal-arrow-parquet
Thank you for the long wait and the help during the process; I just published the data to Source Cooperative: https://beta.source.coop/repositories/fiboa/br-ba-lem/
I am not sure the best way to present the COLUMN section in the README.md, so when you have time, please let me know if I need to edit anything.
The data is currently marked as unlisted on Source Cooperative. Please feel free to mark it as listed if everything is good.
Thanks, well done. I've listed the repository and added it to our central list of datasets, you can now also find it on https://fiboa.org/map/
Use public data from Brazil and:
Instructions available at https://github.com/fiboa/data/blob/main/HOWTO.md