AtlasOfLivingAustralia / galah-python

Query Living Atlases from Python
https://galah.ala.org.au/Python/
MIT License
8 stars 0 forks source link

status always skipped when using galah.atlas_occurences() #205

Closed JojoReikun closed 1 year ago

JojoReikun commented 1 year ago

Hey,

I know this python package is still under development and quite new and I find it awesome that you're working on developing a galah python version!!! I am currently working on a python project using ALA data and would love to use this package, instead of having to use an R package to download the data and then python to handle it.

An issue I am currently facing when I try to download data is that it always returns the error message that the json response status is "skipped" and I have tried a lot of options of taxa to search for, filters etc.

I have configured galah to the Australian Atlas and put in my email that I used for ALA registration. The data_profile is set to "ALA".

The current code I am trying: df_counts_project = galah.atlas_occurrences(taxa="Phascolarctos cinereus", use_data_profile=True)

The errror message: Traceback (most recent call last): File "D:\Jojo\DDC\KoalaWatchDashboard\KoalaDashboardScript\KoalaWatch__main.py", line 11, in main() File "D:\Jojo\DDC\KoalaWatchDashboard\KoalaDashboardScript\KoalaWatch\main__.py", line 7, in main download_ala_data() File "D:\Jojo\DDC\KoalaWatchDashboard\KoalaDashboardScript\KoalaWatch\operations\galah_data_download.py", line 48, in download_ala_data df_counts_project = galah.atlas_occurrences(taxa="Phascolarctos cinereus", use_data_profile=True) File "C:\Users\JojoS\Miniconda3\envs\KoalaDashboardScript\lib\site-packages\galah\atlas_occurrences.py", line 243, in atlas_occurrences if response.json()['status'] == "skipped": KeyError: 'status'

Any feedback on this would be appreciated :) Thanks!

acbuyan commented 1 year ago

Hey Jojo,

Thank you for taking the time to comment! As the one who's written the lion's share of the package, it's great to hear there are users out there that are excited by this package!

Hm, when I've used it, it gives me ~203,000 occurrence records. To try and help you, I have a couple of questions and the code I used:

>>> galah.galah_config(atlas="Australia", email = "amanda.buyan@csiro.au")
>>> galah.atlas_occurrences(taxa="Phascolarctos cinereus", use_data_profile=True)

are you configuring galah like in the above code, or slightly differently? Is it just atlas_occurrences() that isn't working?

JojoReikun commented 1 year ago

Hey,

thanks for getting back to me so quickly! Okay, it's good to hear that it works for you! Must be something on my end then!

I have configured as following: >>> galah.galah_config(atlas="Australia", email="schjojoultz@gmail.com", data_profile="ALA") >>> df_counts_project = galah.atlas_occurrences(taxa="Phascolarctos cinereus", use_data_profile=True)

JojoReikun commented 1 year ago

I actually fixed it!

On ALA it says you can log in with an existing google/facebook account to avoid the sign up process. But that doesn't seem to be enough to use that email for the galah_config statement!

I have resigned up using a different email, and then also received the confirmation link. Using that email now, I can successfully search the occurences!

While I'm at that: What filter word would I have to use to specify a specific database?

acbuyan commented 1 year ago

@JojoReikun here's some code

>>> galah.search_all(fields="data")
>>> galah.show_values(field="datasetID")
>>> galah.atlas_counts(filters="datasetID=SU")

There is a datasetName field; however, you currently can't display the values from that one (I'm currently fixing it and it will be in the next release).

JojoReikun commented 1 year ago

Yep, I found the datasetNamefield before, which is what I have tried.

The workaround you have posted throws out a few errors itself. I am trying to track the cause down myself by looking through the source codes, but maybe you have an idea yourself, so posting the messages here:

I am currently only running, as I need to find the datasetID first: galah.search_all(fields="data") galah.show_values(field="datasetID", verbose=True)

Error: Traceback (most recent call last): File "C:\Users\JojoS\Miniconda3\envs\KoalaDashboardScript\lib\site-packages\pandas\core\internals\construction.py", line 969, in _finalize_columns_and_data columns = _validate_or_indexify_columns(contents, columns) File "C:\Users\JojoS\Miniconda3\envs\KoalaDashboardScript\lib\site-packages\pandas\core\internals\construction.py", line 1017, in _validate_or_indexify_columns raise AssertionError( AssertionError: 2 columns passed, passed data had 4 columns

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "D:\Jojo\DDC\KoalaWatchDashboard\KoalaDashboardScript\KoalaWatch__main.py", line 11, in main() File "D:\Jojo\DDC\KoalaWatchDashboard\KoalaDashboardScript\KoalaWatch__main__.py", line 7, in main download_ala_data() File "D:\Jojo\DDC\KoalaWatchDashboard\KoalaDashboardScript\KoalaWatch\operations\galah_data_download.py", line 50, in download_ala_data galah.show_values(field="datasetID", verbose=True) File "C:\Users\JojoS\Miniconda3\envs\KoalaDashboardScript\lib\site-packages\galah\show_values.py", line 94, in show_values tempdf = pd.DataFrame([entry['i18nCode'].split('.')],columns=['field','category']) File "C:\Users\JojoS\Miniconda3\envs\KoalaDashboardScript\lib\site-packages\pandas\core\frame.py", line 746, in init__ arrays, columns, index = nested_data_to_arrays( File "C:\Users\JojoS\Miniconda3\envs\KoalaDashboardScript\lib\site-packages\pandas\core\internals\construction.py", line 510, in nested_data_to_arrays arrays, columns = to_arrays(data, columns, dtype=dtype) File "C:\Users\JojoS\Miniconda3\envs\KoalaDashboardScript\lib\site-packages\pandas\core\internals\construction.py", line 875, in to_arrays content, columns = _finalize_columns_and_data(arr, columns, dtype) File "C:\Users\JojoS\Miniconda3\envs\KoalaDashboardScript\lib\site-packages\pandas\core\internals\construction.py", line 972, in _finalize_columns_and_data raise ValueError(err) from err ValueError: 2 columns passed, passed data had 4 columns