Open wenzeslaus opened 4 years ago
On 21/05/20 05:13, Vaclav Petras wrote:
Describe the solution you'd like
After installing GRASS GIS to my system, I would like the |grass| package to be available without having to put it to the the path manually.
[...]
Describe alternatives you've considered
[...]
Use grass-session
The following looks much better:
from grass_session import Session import grass
In my eyes, this is the only sensible way, as using anything from the grass python library without defining the necessary environment variables that allows you to run GRASS modules doesn't seem very helpful.
However, I need to use two different packages, get the import order right, and most importantly, I need to install it from a completely different source. Additionally - and that may or may not be fixed by using dist-packages - grass_session/grass-session is not linked with the installation, so I may end up setting up |GRASSBIN| for it beforehand anyway so not that different from using |PYTHONPATH|.
No, it's just easier for those who do not like to set env variables ;-) I don't really understand why importing two packages is an issue, here.
Import from GRASS GIS
I can run the script inside GRASS GIS, but then I need to get GRASS GIS running somehow first, so I'm not using GRASS GIS from Python anymore, but more the other way around. This is great for writing GRASS GIS modules, but not for writing scripts and using GRASS GIS in them. It is also more difficult to apply this approach for running within some other environment such as JupyterLab or a Python IDE.
Getting "GRASS GIS running" doesn't mean anything else than setting a few environment variables. So, actually, running Session() from grass-session in a Python script, or running the grass7 startup script before launching Python, or using a lot of os.environ/sys.path.append calls to set up the environment manually, have exactly the same effect. It just depends on whether you want to do it all within one Python script or not.
You could also just define the necessary variables once and for all in your .basrc or similar and never worry about it again... ;-)
Using --exec from Python
I could use GRASS GIS through command line interface as in |grass ... --exec|, but I'm throwing away the useful |grass.script| functionality and I will eventually end up re-implementing it again just outside of GRASS GIS.
This is why I personally find using GRASS_BATCH_JOB quite handy. Here's an example extract of an HPC instructions file of how I run a 2000+ lines Python script using grass.script on a node:
cd ${TMPDIR} echo "g.mapsets walous_permanent,walous_orthos" >> ${TMPDIR}/walous_segmentation_batchjob${TUILE}.sh echo "python ${HOME}/WALOUS/SRC/walous_obia_tiles_data_creation.py ${TUILE}" >> ${TMPDIR}/walous_segmentation_batchjob${TUILE}.sh chmod +x $TMPDIR/walous_segmentation_batchjob${TUILE}.sh
export GRASS_BATCH_JOB=$TMPDIR/walous_segmentation_batchjob${TUILE}.sh grass76 -c $TMPDIR/${LOCAL_GISDB}/tmplocation/${LOCAL_MAPSET} unset GRASS_BATCH_JOB rm -r $TMPDIR/${LOCAL_GISDB} rm $TMPDIR/walous_segmentation_batchjob${TUILE}.sh
and in the walous_obia_tiles_data_creation.py file I just do this:
import grass.script as gscript from grass.pygrass.modules.grid.grid import GridModule
Additional context
As far as I understand, this would bring Linux installations to where the Windows one is now (cbc7826 https://github.com/OSGeo/grass/commit/cbc782674d0bac95be0df8f6b21e7366006be5f5, example https://grasswiki.osgeo.org/wiki/Tools_for_Python_programming#PyCharm_IDE).
The python-grass79.bat only does have the job the Session() function does. If you look at the script below the PyCharm screenshots, you can see that you still have to:
import grass
and
gsetup.init(os.environ['GISBASE'], dbase, location, 'PERMANENT')
So, nothing different from what happens in GNU/Linux.
I'm really getting the impression that you are trying to make GRASS GIS into something it isn't, while also asking it to bring your coffee. ;-)
In my eyes, this is the only sensible way, as using anything from the grass python library without defining the necessary environment variables that allows you to run GRASS modules doesn't seem very helpful.
I think you are focusing on setting up the environment to run GRASS GIS modules. That's not what I'm talking about. I simply want to have a grass
package on path after I install GRASS GIS. What happens after that is beyond this ticket, but it could be:
$ sudo apt install grass
$ python3
>>> import grass
>>> gs = grass.session.Session(...)
>>> gs.run_command("r.slope.aspect", ...)
..but the API is not my concern here. My concern is packaging and installation and how to get to the API.
Additional context As far as I understand, this would bring Linux installations to where the Windows one is now (cbc782674d0bac95be0df8f6b21e7366006be5f5)...
The python-grass79.bat only does have the job the Session() function does.
Session
from the grass_session
package sets the runtime environment to run modules and to connects to a mapset. That's much more than what python-grass79.bat
does. When import grass
is readily available after installing GRASS GIS, something like Session
can be imported directly from GRASS GIS and do what is necessary. This would happen without an external tool such as grass_session
.
I'm really getting the impression that you are trying to make GRASS GIS into something it isn't...
I'm not sure what you mean by that, but I'm just trying to make GRASS GIS easy to use. The fact that there is grass-session/grass_session project/package shows that there is a desire to use GRASS GIS Python package(s) in Python with a minimal or ideally no prior setup. If grass Python package installs alongside all other packages on Linux, then the grass Python package is readily available (i.e, import grass
) to take care of any additional steps which are needed (e.g., grass.Session(...)
).
I think my main problem is that I just don't understand the itch you are trying to scratch. And that we are discussing on the solution, while what I think we might want to discuss first is the exact definition of the problem (BTW the discussion on PR #602 is probably similar in that aspect). :-)
So, let me try to formulate what I understand:
You would like to create a stand-alone python-grass library which would be devoid of any reference to GRASS environment (i.e. no os.getenv("GISBASE") or similar calls, at least not in the init.py). The functions in that library would however be useless until you have initialized a "session", i.e defined the environment variables (through whatever python functions that do this, i.e. a Session() call)
So, once you've imported python-grass, you would either
import grass.script.setup
or a grass-session equivalent that could be directly integrated into this python-grass package.
This would be more or less equivalent to slightly changing grass-session to integrate a parameter to the initialization of a Session that would automatically import the GRASS python libraries e.g.
Session(...., loadGrassPythonLibs=True)
or ?
I somehow, conceptually, prefer the latter as it makes it very apparent that you have to define a "session" first before using any of the other parts contained in the package, but I guess it probably wouldn't make much of a difference.
Do I understand correctly ?
And to move even further upstream in my understanding: what you are aiming for is for people to be able to do everything in a purely pythonic way, without having to have any understanding of the underlying system ? I.e. the python equivalent of the QGIS GRASS plugin ;-)
I already mentioned most of these points already, but to reiterate:
import grass
after ... install grass
is just expected behavior.
i. GRASS GIS has Python API.
ii. User installs GRASS GIS.
iii. User should have access to that API from Python.python-grass79.bat
.
from osgeo import gdal
.
dist-packages
at least on Ubuntu, GRASS GIS should behave the same way, so users can switch between them without additional friction.rgrass7
is available without any additional setup.import grass
is readily available. It can integrate with grass-session code or just make the setup easier with the current grass.script.setup.init()
or whatever else we think is appropriate.How the API should look like is more discussion for the mailing list, especially the modules versus C library functions access, but just to give you some idea here is the current code to get from zero to a module call (assuming presence of sample data):
$ sudo apt install grass
$ python3
>>> import os
>>> import sys
>>> import subprocess
>>> gisbase = subprocess.check_output(["grass", "--config", "path"]).strip()
>>> os.environ['GISBASE'] = gisbase
>>> sys.path.append(os.path.join(gisbase, "etc", "python"))
>>> import grass.script as gs
>>> import grass.script.setup as gsetup
>>> rcfile = gsetup.init(gisbase, "data/grassdata", "nc_basic_spm_grass7", "user1")
>>> gs.run_command("r.slope.aspect", ...)
Here is how the code could look like:
$ sudo apt install grass
$ python3
>>> from grass.session import Session
>>> gs = Session("data/grassdata", "nc_basic_spm_grass7", "user1")
>>> gs.run_command("r.slope.aspect", ...)
I would prefer to create a PR for placing the grass package alongside other Python packages, but I don't know what needs to be done, so I was hoping we can discuss what needs to be done to make this happen.
Am 22. Mai 2020 17:02:57 MESZ schrieb Vaclav Petras notifications@github.com:
Here is how the code could look like:
$ sudo apt install grass $ python3 >>> from grass.session import Session >>> gs = Session("data/grassdata", "nc_basic_spm_grass7", "user1") >>> gs.run_command("r.slope.aspect", ...)
So this would mean that Session() automatically sets GISBASE when called ?
I would prefer to create a PR for placing the grass package alongside other Python packages, but I don't know what needs to be done, so I was hoping we can discuss what needs to be done to make this happen.
As mentioned, I think what will be necessary is to find a mechanism that allows the python package to set GISBASE (and GRASS_PYTHON ? Others ?). And some clear error messages if the user tries to run anything else but Session () before having run the latter.
Moritz
So this would mean that Session() automatically sets GISBASE when called?
The particular interface is secondary and I consider it out of scope for this issue. The important part is that you can do import grass
. What happens after that, i.e., how the API should look like is for a different issue.
In other words, in the scope of this issue, it does not matter if there is some new API like this:
>>> from grass.session import Session
>>> gs = Session("data/grassdata", "nc_basic_spm_grass7", "user1")
Or the only API is just the API we already have:
>>> import grass.script.setup as gsetup
>>> gsetup.init("/path/to/grass/dist", "data/grassdata", "nc_basic_spm_grass7", "user1")
The idea is to have the grass package available, i.e., import grass
, so that user can use the API (whatever the API is).
As mentioned, I think what will be necessary is to find a mechanism that allows the python package to set GISBASE (and GRASS_PYTHON ? Others ?).
We already have grass.script.setup.init()
function, grass-session
project, and even lib/init/grass.py
is in Python. It is quite possible I'm missing something, but I think we already have code to do all this. After all, grass.py
is what starts GRASS GIS. The functions from there need to go to the library anyway for maintenance reasons (some are already there and some other are duplicated).
And some clear error messages if the user tries to run anything else but Session () before having run the latter.
Fair point, this would be nice to have, but it is really a tertiary issue. This is in fact the current situation after setting the right sys.path
for import grass
to succeed. User sets that, does import grass
, but the code does not have any special messages or checks in place to inform about the need to call grass.script.setup.init()
function.
However, let's consider similar, but slightly different situation where the user does not know about the need to set the right sys.path
. User install GRASS GIS, does import grass
and gets ImportError: No module named 'grass'
. In this case, not only that there is no clear error messages informing the user about the need to set sys.path
/PYTHONPATH
(or running the code from GRASS GIS), but more importantly, we don't have any way of providing that information to the user. Hence, this issue.
As for the changes needed to resolve this issue, what is not clear to me is what needs to be done for the grass package (i.e., code which resides in /lib/python/
and installs into $PREFIX/etc/python/grass
) to be installed alongside other Python packages (in the system) on Linux (and perhaps FreeBSD).
As for the changes needed to resolve this issue, what is not clear to me is what needs to be done for the grass package (i.e., code which resides in
/lib/python/
and installs into$PREFIX/etc/python/grass
) to be installed alongside other Python packages (in the system) on Linux (and perhaps FreeBSD).
IIUC, we are talking about implementing parts of the current startup script into the init.py.
Whether or not GISBASE is defined internally (i.e. at install time) or by the user at runtime is thus not tertiary or secondary, but a fundamental question that will determine the implementation.
If runtime definition of GISBASE is acceptable, than possibly taking out reading the content of this variable in init.py is enough, plus possibly some way of checking whether the python version used is acceptable as GRASS_PYTHON. However, if we go towards a stand-alone python-grass package I personally would prefer to go all the way, i.e. not have the user define GISBASE.
Reading through this conversation, I agree with Moritz that
$ python3
>>> import grass
does not make sense. Users must establish a GRASS session before using any functionality of the GRASS python library, otherwise there will be lots of confusing error messages, and user feedback like "I imported grass but it does not work. GRASS does not work."
I guess you are thinking about a Python interface similar to the R grass7 package which would require to integrate grass-session into lib/python.
This raises a chicken and egg problem: the GRASS python lib only works if GISBASE and all the other env vars has been defined, but you want the GRASS python lib to define GISBASE and all the other env vars defined by grassxy.py.
I regard this feature request as invalid as long as there is no mechanism in lib/python to establish a GRASS session, similar to the R grass7 package.
$ python3 >>> import grass
does not make sense.
These work:
import subprocess
import pip
import antigravity
import osgeo
import qgis
import docker
import apt
import boost
import vboxapi
import vtk
import lzma
So, for a Python user, import grass
makes sense too.
Users must establish a GRASS session before using any functionality of the GRASS python library, otherwise there will be lots of confusing error messages, and user feedback like "I imported grass but it does not work. GRASS does not work."
And what about ImportError: No module named 'grass'
after installing GRASS GIS? "I installed grass, but it won't even import. GRASS does not work." Why would this situation be less important than the one you are describing?
I'm more concerned about the users we don't have because grass does not even import than the complaints after people try to use it.
I guess you are thinking about a Python interface similar to the R grass7 package which would require to integrate grass-session into lib/python.
You can read one of the examples in this issue showing the possible APIs, but the point is not how the API looks like and what exactly each step is doing. This issue is about reducing the steps required to get to the API and about getting to the API in the same way as Python users are used to from other packages.
The reason I brought up R in the initial description and in the point 5 above is that rgrass7 is available without any additional setup before the import/library() call.
This raises a chicken and egg problem: the GRASS python lib only works if GISBASE and all the other env vars has been defined, but you want the GRASS python lib to define GISBASE and all the other env vars defined by grassxy.py.
I'm saying that the statement import grass
should work without raising an error after user installs GRASS. This now works only after setting sys.path
prior to that import statement which is not an expected behavior of a Python package.
What exactly should happen during the import and afterwards, i.e., the API and implementation is beyond what I aim to discuss here. I'm happy to discuss this elsewhere.
In the context of making import grass
readily available, you can consider the Session API from my examples above, or grass-session code moved to the grass package, or even the current grass.script.setup.init()
function. The point here is that once grass
is imported, functions in grass
can be used to set up any environment necessary or different environments based on user's needs.
The only prerequisite for that is that import grass
works and for that to work out of the box, it needs to be readily available to Python which is achieved on Linux by placing the Python package alongside other Python packages in the system. As I already mentioned, what is needed for this installation step is not clear to me and therefore I opened this issue, so we can discuss that.
I regard this feature request as invalid as long as there is no mechanism in lib/python to establish a GRASS session, similar to the R grass7 package.
There is actually a grass.script.setup.init()
function in lib/python/script which is similar to initGRASS()
from rgrass7.
I think these are two separate problems: getting to the API and how the API looks like. We can resolve them both together or in any order, but these is no need for one to be blocked by the other.
As for the changes needed to resolve this issue, what is not clear to me is what needs to be done for the grass package (i.e., code which resides in
/lib/python/
and installs into$PREFIX/etc/python/grass
) to be installed alongside other Python packages (in the system) on Linux (and perhaps FreeBSD).IIUC, we are talking about implementing parts of the current startup script into the
__init__.py
.
As I said in the original description, I'm talking about being able to import the grass package without having to put it to the path manually after installing GRASS GIS in the system (Linux).
What happens during import, i.e., in __init__.py
(or __init__.py
s) or what user needs to do afterwards to get things going is important to me, but not in this issue. Here what is important is that the grass package will be available and thus it can be used to do the setup in one way or the other.
Whether or not GISBASE is defined internally (i.e. at install time) or by the user at runtime is thus not tertiary or secondary, but a fundamental question that will determine the implementation.
Having GISBASE defined here or there does not influence the Python import system. What you are saying is important for how the API will look like or how convenient or reliable an automatic setup could be.
...plus possibly some way of checking whether the python version used is acceptable as GRASS_PYTHON.
This will be the same as for all the other packages installed to the system, i.e., ensuring the right Python is not the job of the package itself, but the installation/distribution/system should take care of it, i.e., the package should be in the dist-packages
directory of the right Python.
However, if we go towards a stand-alone python-grass package I personally would prefer to go all the way, i.e. not have the user define GISBASE.
Yes, that would be ideal. GISBASE can be baked into the grass package or obtained from the grass executable (which can be baked into the grass package or assumed to be on path). However, that's again not the focus of this issue.
My concern is how to achieve that simple import grass
works, i.e., grass imports without a prior setup on Linux where other Python packages are installed in the system and import without a prior setup.
On 27/05/20 05:11, Vaclav Petras wrote:
As I said in the original description, I'm talking about being able to import the grass package without having to put it to the path manually after installing GRASS GIS in the system (Linux).
What happens during import, i.e., in |init.py| (or |init.py|s) or what user needs to do afterwards to get things going is important to me, but not in this issue. Here what is important is that the grass package will be available and thus it can be used to do the setup in one way or the other.
[...]
My concern is how to achieve that simple |import grass| works, i.e., grass imports without a prior setup on Linux where other Python packages are installed in the system and import without a prior setup.
I think that this is an artifical separation which I don't really agree with. Not thinking about how to handle the question of GISBASE is fundamental to the question of how to make a python-grass package importable and usable without knowing about paths. The way we answer this question will determine the necessary means to make a python-grass package possible.
I don't know what the aim is of just being able to 'import grass' if there is no reflection about the next step, and so I don't see the interest in working on creating an importable package just to then start the discussion on how GISBASE should be defined.
If the answer to that question is that GISBASE should be defined at coding time by the user, then creating an importable python-grass is probably easier, but documentation has to be very good and we risk getting a lot of work helping users. If GISBASE is defined at install time, then probably this will make life easier for both users and us.
Getting back to this issue:
when im trying this from grass.session import Session its giving me error ModuleNotFoundError: No module named 'grass.session' I am trying to run sebal model using python and grass gis The script "from grass_session import session" is also not working and i am getting the error "RuntimeError: Cannot find GRASS GIS start script: 'grass', set the right one using the GRASSBIN environm. variable" any help will be appreciated
@musmanbakht import grass
does not work yet (hence this issue) and import grass_session
requires GRASS GIS to be installed (not ideal). Please open a discussion to debug your specific problem.
To add to this discussion, I have taken on an old piece of software which uses GRASS, which I am currently migrating to new library versions. I found this issue after searching for ModuleNotFoundError: No module named 'grass'
, having performed the exact steps @wenzeslaus describes:
$ python
>>> import grass
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'grass'
As a long-time Python developer, and short-time GRASS user (yes, starting this week), it really is the expected behaviour that on installing a library with Python API bindings, those bindings will be automatically available through the system Python path.
This issue describes setting up GISBASE environment variable before working with GRASS. Other Python packages which require some configuration, like this, will allow the import of the module, and then raise an error. This is in line with the principle of self-documenting code. IMHO the expected behaviour if these environment variables are not set would be to raise a descriptive error message (maybe even linking to the appropriate documentation!) so that a novice user such as myself can be redirected to RTFM rather than raising an issue to say "GRASS does not work".
In fact, the error raised after adding /usr/lib/grass78/etc/python
to the $PATH
is an excellent start:
import os
import sys
import subprocess
gisbase = subprocess.check_output(["grass", "--config", "path"]).strip()
sys.path.append(os.path.join(str(gisbase, 'UTF-8'), "etc", "python"))
import grass
Gives a stacktrace ending in:
File "/vagrant/wps_processes/wps_wsgi.py", line 10, in <module>
import grass
File "/usr/lib/grass78/etc/python/grass/__init__.py", line 21, in <module>
_LOCALE_DIR = os.path.join(os.getenv("GISBASE"), 'locale')
File "/usr/lib/python3.10/posixpath.py", line 76, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
Which demonstrates to the user that they need to set up a $GISBASE
env variable. I'm not expecting the software to do this for me: I use several Python packages which have exhibited similar behaviour, but the fact that it tells me short-cuts having to search the web for the original import error. One could even achieve a descriptive error message by handling the TypeError in a try/catch in __init__.py
.
Putting the grass package in /usr/lib/python3/dist-packages
as suggested would enable this behaviour without having to manually scrape paths from the grass
subprocess, which is a massive usability gain for the Python end-user. I suspect this could even be accomplished by symlinking to the directory in /usr/lib/grass78/etc/python
within the apt install script.
Hope this comes as useful rather than a +1 or "me too" type issue comment. Thanks for all your work developing and maintaining this project.
Thanks for reviving this issue with an additional analysis!
I'd like to keep this discussion going and add my support - it would be so much nicer if we could simply start GRASS like this:
import grass.jupyter as gj
import grass.script as gs
from grass.jupyter import TimeSeriesMap
session = gj.init("~/Grassdata/nc_climate_spm_2000_2012")
Rather than using something like the following (with complaints from linters etc.):
import subprocess
import sys
sys.path.append(
subprocess.check_output(["grass", "--config", "python_path"], text=True).strip()
)
import grass.jupyter as gj
import grass.script as gs
from grass.jupyter import TimeSeriesMap
session = gj.init("~/Grassdata/npm")
Symlinking the GRASS python package into site-packages works, so it would appear that having GRASS symlink the /etc/python/grass
into dist-packages
during installation should also work. However, if you are working in an virtualenv or conda environment, then there still needs to be a way to install the GRASS python package. A full GRASS conda package would be incredible for many reasons, but also, lots of projects including GDAL, have their python bindings on PyPI but have a dependency to a system install of the software. Would be really nice to be able to pip install grass-python
that uses a system installed GRASS.
It's still in my roadmap, it's one of my goals, and it will greatly simplify testing once it is possible to have grass available as a python package. You also have to think about cross-platform, dealing with users who would have conflicting dependencies, not breaking system packages, etc... We need to do it the right way. One of the problems is that most of the python code ends up using the grass libraries or one of the many module binary files. So the scope keeps getting bigger to make it right.
This is not forgotten, please see related discussion in #3661.
A full GRASS conda package would be incredible for many reasons...
A conda package is indeed in works. @HuidaeCho can comment here more if needed, but the bottom line is that there is no prototype to test yet, but there is progress and a plan.
...lots of projects including GDAL, have their python bindings on PyPI but have a dependency to a system install of the software. Would be really nice to be able to
pip install grass-python
that uses a system installed GRASS.
I think this is still sort of open question. Maybe we can revisit that after FHS and conda package if that's still an issue.
Thanks everyone for the information and great to see that this is on the road map. Maybe the pip install grass-python
concept isn't an issue - the conda package would probably solve this for most use cases. Will be huge advantage for software accessibility in government and corporate settings because we basically all use conda.
Is your feature request related to a problem? Please describe.
On Linux, the
grass
package is not available in the system Python (i.e. not on path) after I install GRASS GIS into the system.Describe the solution you'd like
After installing GRASS GIS to my system, I would like the
grass
package to be available without having to put it to the the path manually.For example on Ubuntu, I would like to use the following:
This means that
grass
package would have to be in/usr/lib/python3/dist-packages
where you can find, for example, the qgis package.This is just how other things behave: GDAL (
from osgeo import gdal
), QGIS (if you are lucky enough with dynamic libraries), Python packages in general, ... In R I can import/load the bindings without a prior setup. Overall, this goes against user expectations. It is true that many of these are internally very different than GRASS GIS, but that's something users should not worry about.Describe alternatives you've considered
Modify sys.path or PYTHONPATH
The current way is modifying
sys.path
, but for that I first need to find out where GRASS GIS lives (manually or programmatically):If you are linting your code you will need to add also at least one of:
(To make it actually work, you will likely need to add also
os.environ['GISBASE'] = gisbase
, but that's a separate issue.)Use grass-session
The following looks much better:
However, I need to use two different packages, get the import order right, and most importantly, I need to install it from a completely different source. Additionally - and that may or may not be fixed by using dist-packages - grass_session/grass-session is not linked with the installation, so I may end up setting up
GRASSBIN
for it beforehand anyway so not that different from usingPYTHONPATH
.Import from GRASS GIS
I can run the script inside GRASS GIS, but then I need to get GRASS GIS running somehow first, so I'm not using GRASS GIS from Python anymore, but more the other way around. This is great for writing GRASS GIS modules, but not for writing scripts and using GRASS GIS in them. It is also more difficult to apply this approach for running within some other environment such as JupyterLab or a Python IDE.
Using --exec from Python
I could use GRASS GIS through command line interface as in
grass ... --exec
, but I'm throwing away the usefulgrass.script
functionality and I will eventually end up re-implementing it again just outside of GRASS GIS.Additional context
As far as I understand, this would bring Linux installations to where the Windows one is now (cbc782674d0bac95be0df8f6b21e7366006be5f5, example).
Optionally, this can be the time to re-consider the name of the package (I used
grass
above), inclusion of GUI code, and placement of these in the source code.