Closed JustinGOSSES closed 2 years ago
💖 Thanks for opening this pull request! 💖
Please make sure you read our Contributing Guide and abide by our Code of Conduct.
A few things to keep in mind:
make format
to make sure your code follows our style guide.To load .h5 files into pandas, I need Tables (as it is called in PIP) or PyTables (as it is called in conda) package. In your build pipeline, you load the requirements.txt packages with miniconda. This creates a problem as it tries to load Tables via conda.
Normally, requirements.txt I think of as for PIP and environment.yml for Conda. What would you suggest doing such that your build passes, but someone can also just run requirements.txt with PIP and not have to always use Conda? Or do you mandate people use Conda?
Installing dependencies
===============================================
Capturing dependencies from requirements.txt
Capturing dependencies from requirements-dev.txt
Installing collected dependencies:
pooch>=0.5
xarray
pandas
rasterio
tables
matplotlib
cmocean
cartopy
pytest
pytest-cov
coverage
pylint
flake8
sphinx==1.8.5
sphinx_rtd_theme
sphinx-gallery
numpydoc
twine
codecov
Collecting package metadata: ...working... done
Solving environment: ...working... failed
PackagesNotFoundError: The following packages are not available from current channels:
- tables
I seem to be blocked by the fact that to load HDFS into Pandas, you need tables/pyTables as a dependency. It is called tables
in PIP and pytables
in Conda.
This is causing a problem as you're install requirements.txt using both pip and conda in the build pipelines.
@JustinGOSSES sorry for the delay. These differences between conda-forge and pypi have caused other headaches in the past. We can't just use the environment.yml file cause it sets the Python version adn we want to test in multiple versions on CI. We can't use pip on CI because of troublesome dependencies. So we're in a bit of a bind.
Currently:
install_requires
is used to set pip dependenciesThis last one is gonna cause problems because I started reading in the requirements from requirements.txt
to avoid duplication. Clearly this isn't going to work anymore.
I can see 2 ways forward:
setup.py
and conda dependencies in requirements.txt
So my question is: do we really need to have the data in hdf5? Is it much smaller than xz compressed csv? Is there another binary format that would avoid those extra dependencies?
Thanks for explanation. I'll try switching it to a xz compressed csv.
switched, passes checks!
Responded to suggestions that are not inline comments:
Edited docstrings for clarity for users not familiar with well logs.
name: preprocessed McMurray facies dataset about: requesting the addition of this as a new dataset
This is a dataset with facies and well log curve data from the McMurray & Wabiskaw formations in Alberta, Canada. More information about the processed dataset's history can be found here & information about the original dataset here:
Desired dataset/model:
This is to add code for a facies dataset from the McMurray formation in Alberta, Canada.
I'll do some more work on this pull request to double check format & style as the make format command wasn't able to find Black.
I also need to think about tests
In the requirements.txt I had to add tables as a dependencies as I use a zipped h5 format file and Pandas needs tables to open that type of file.
Fixes #
Reminders
make format
andmake check
to make sure the codefollows the style guide.
doc/api/index.rst
.Do not merge this pull request yet. Mostly adding for visibility and to avoid problems when merge is pressed.