Deadwood-ai / deadwood-api

Main FastAPI application for the deadwood backend
GNU General Public License v3.0
0 stars 0 forks source link

final data integration process #29

Closed cmosig closed 2 months ago

cmosig commented 3 months ago

@mmaelicke

Do you need SSH access to our infra or is SFTP enough? Either is fine, just need to know.

JesJehle commented 3 months ago

@mmaelicke we will have 6TB Orthophoto of California. How should be integrate this?

cmosig commented 3 months ago

FYI: current stats

mmaelicke commented 3 months ago

I think we need to discuss this in a short meeting next week. Not sure yet...

cmosig commented 3 months ago

Alright :+1: Please send quick message when ready for meeting

JesJehle commented 3 months ago
JesJehle commented 3 months ago

this is a first, super ugly draft of the upload script. Only for one geotif and labe:

import requests
from supabase import create_client
import json
from pydantic_geojson import MultiPolygonModel, PolygonModel
import geopandas as gpd

BASE_URL = "http://0.0.0.0:8762"
GEOTIFF_FILE = "/Users/januschvajna-jehle/data/deadwood-example-data/orthos/uavforsat_2017_CFB044_ortho.tif"
LABELS_FILE = "/Users/januschvajna-jehle/data/deadwood-example-data/labels_aoi/uavforsat_2017_CFB044_ortho_polygons.gpkg"

# LABELS_FILE = 'uavforsat_2017_CFB044_labels.geojson'

SUPABASE_KEY = ""
SUPABASE_URL = ""

USER = "jesjehle@gmx.de"
PASSWORD = ""

client = create_client(SUPABASE_URL, SUPABASE_KEY)
client.auth.sign_in_with_password({"email": USER, "password": PASSWORD})

auth_response = client.auth.refresh_session()
session = auth_response.session
access_token = session.access_token
user_id = session.user.id

res = None

with open(GEOTIFF_FILE, "rb") as f:
    upload_res = requests.post(
        BASE_URL + "/datasets",
        files={"file": f},
        headers={"Authorization": f"Bearer {access_token}"},
    )

upload_res_json = upload_res.json()
dataset_id = upload_res_json["id"]
name = upload_res_json["file_name"]

# sample output
# {'id': 248,
#  'file_name': '1ba98c89-6b76-4402-bd7c-4680cb0a0c8b_uavforsat_2017_CFB044_ortho.tif',
#    'file_alias': 'uavforsat_2017_CFB044_ortho.tif',
#    'file_size': 1036120927, 'copy_time': 13.869181871414185,
#    'sha256': '4f8ca9a808442eae8f0a34d53a0ffcfa0f5698e4b5fed90fdd0de067529d4f82',
#      'bbox': 'BOX(8.116694192013465 48.17413731568594, 8.11973164880153 48.17625264529826)',
#      'status': 'pending', 'user_id': '6afa4242-681e-4611-a659-3287d06f6e49',
#      'created_at': '2024-08-14T13:56:19.065920+00:00'}

# dataset_id = 248
# name = "1ba98c89-6b76-4402-bd7c-4680cb0a0c8b_uavforsat_2017_CFB044_ortho.tif"

# genearte metadata
metadata_res = requests.put(
    BASE_URL + f"/datasets/{dataset_id}/metadata",
    json={
        "dataset_id": dataset_id,
        "user_id": user_id,
        "name": name,
        "platform": "drone",
        "authors": "string",
        "license": "cc-by",
        "aquisition_year": 2017,
        "aquisition_month": None,
        "aquisition_day": None,
    },
    headers={"Authorization": f"Bearer {access_token}"},
)

# buid cog
build_cog_res = requests.put(
    BASE_URL + f"/datasets/{dataset_id}/force-cog-build",
    json={
        # "overviews": 8,
        # "resolution": 0.04,
        # "profile": "jpeg",
        # "quality": 75,
        # "force_recreate": False,
    },
    headers={"Authorization": f"Bearer {access_token}"},
)

# build thumbnail
build_thumbnail_res = requests.put(
    BASE_URL + f"/datasets/{dataset_id}/build-thumbnail",
    json={
        # "force_recreate": False,
    },
    headers={"Authorization": f"Bearer {access_token}"},
)
# print(build_thumbnail_res)

aoi = gpd.read_file(LABELS_FILE, layer="aoi").to_json()
label = gpd.read_file(LABELS_FILE, layer="standing_deadwood").to_json()

aoi_json = json.loads(aoi)
# print("aoi_json:", aoi_json)
# print("aoi:", aoi_json["features"][0]["geometry"])
labels_json = json.loads(label)
# print("aoi:", aoi_json)
print("labels:", labels_json)

aoi_model = PolygonModel(
    type="Polygon", coordinates=aoi_json["features"][0]["geometry"]["coordinates"]
)
label_model = MultiPolygonModel(
    type="MultiPolygon",
    coordinates=[aoi_json["features"][0]["geometry"]["coordinates"]],
)
# print(aoi_model)
# print(label_model)

res_labels = requests.put(
    BASE_URL + f"/datasets/{dataset_id}/labels",
    json={
        # "aoi": {"type": "Polygon", "coordinates": [[[null, null]]]},
        # "aoi": aoi_json["features"][0]["geometry"],
        "aoi": aoi_model.model_dump_json(),
        # "label": {"type": "MultiPolygon", "coordinates": [[[[null, null]]]]},
        # "label": labels_json["features"][0]["geometry"],
        "label": label_model.model_dump_json(),
        "label_source": "visual_interpretation",
        "label_quality": 0,
        "label_type": "point_observation",
    },
    headers={"Authorization": f"Bearer {access_token}"},
)

# LABELS_FILE = "/Users/januschvajna-jehle/data/deadwood-example-data/labels_aoi/uavforsat_2017_CFB044_polygons.gpkg"

# LABELS_FILE = 'uavforsat_2017_CFB044_labels.geojson'

@mmaelicke We have problems satisfying the pydantic multipolygon model. Any suggestions?

mmaelicke commented 3 months ago

Can't have a look right now. Have a look at migrate.py which does the same thing. In multipolygons you need to get the amount of braces right. Maybe that's the problem

Iwill look into it tomorrow

cmosig commented 3 months ago

The library parses the dict locally fine, but when sending the exact same data structure to the server, it fails.

Attaching the .gpkg that we used for testing in the above code (need to unzip). uavforsat_2017_CFB044_ortho_polygons.zip

mmaelicke commented 3 months ago

Is there an error message?

The lines you used last create a JSON-encoded string (with model_dump_json), I would suggest to use model_dump which creates a dict. The json argument of httpx HTTP methods will JSON-encode them again and I think you end up with a GeoJSON geometry that was encoded twice, which the API can't parse anymore. I haven't tried that, but I can try it myself later if you want. I am just not sure, when I have time for that today...

cmosig commented 3 months ago

I think the error message was "Method not allowed" or something "not allowed".

@JesJehle can you send the error? I dont have the code.

cmosig commented 3 months ago

@mmaelicke @JesJehle if you send me the .env somehow, then I can reproduce (or fix) the error from my machine.

image

JesJehle commented 3 months ago

error

I think the error message was "Method not allowed" or something "not allowed".

@JesJehle can you send the error? I dont have the code.

error is: api-1 | INFO: 192.168.65.1:60103 - "PUT /datasets/267/labels HTTP/1.1" 405 Method Not Allowed

cmosig commented 3 months ago

Received the keys. Debugging this on Tuesday, next week.

mmaelicke commented 3 months ago

error

I think the error message was "Method not allowed" or something "not allowed". @JesJehle can you send the error? I dont have the code.

error is: api-1 | INFO: 192.168.65.1:60103 - "PUT /datasets/267/labels HTTP/1.1" 405 Method Not Allowed

Did not see it right away: The PUT HTTP verb is not allowed on the labels route. You need to use a POST verb. The reason is that this route is not idempotent. That means, calling /label twice will result in two label datasets. A PUT would be translated to an upsert, so calling for example the /metadata route twice, would result in an update of the exising metadata.

So if you use requests.post for the labels, everything should be fine.

cmosig commented 3 months ago

Thanks all routes work now! Will do the processing tomorrow.

fun story: I deployed the docker container on our infrastructure for the processing this afternoon and this locked everyone out of the server. So that was a fun walk of shame to the server room :) turns out the default docker ip range was identical with the ip range through which we access the server, creating conflicts...

mmaelicke commented 3 months ago

Yeah, the ICEs used to use the same ip range so I was wondering for ages why my tools only worked at home and not on the way to work :)

So I will emty the database and remove the old data?

cmosig commented 3 months ago

Removing the database means emptying all supabase with names v1_* ? I could do that myself, would make the upload process easier as I need to make some more tests.

mmaelicke commented 3 months ago

Yeah emtying the tables and removing the associated files from the storage server.

I just did that, meaning as long as you test on a local system, the storage server stays empty. Then we can copy the processed files after you finished. If you test the live API at data.deadtrees.earth/api/v1 I need to remove the testing files from the server again.

cmosig commented 3 months ago

okay sounds good

cmosig commented 3 months ago

v1_cogs still has data, that can also be emptied, right?

mmaelicke commented 3 months ago

yes. You can now empty and re-fill the tables as you wish until you start the actual processing

mmaelicke commented 3 months ago

I will be have a look into Github and my mails later today to see if there are still issues left you need me to solve.

cmosig commented 3 months ago

Thanks. No rush. Tomorrow is also fine. I also have other things on my todolist :)

cmosig commented 3 months ago

For some reason I cannot delete the rows in v1_datasets because of some foreign key constraint. Is that intended? All other v1 tables are empty.

cmosig commented 3 months ago

Works now. I dont know why.

cmosig commented 3 months ago

For reference, the summary of changes until now:

mmaelicke commented 3 months ago

Thanks for that! Good List. I will take care from API-side that these changes are implemented on the main branch correctly after my vacation

cmosig commented 3 months ago

Nach ziemlich genau einer Stunde Upload ist das hier passiert:

api-1      | Traceback (most recent call last):
api-1      |   File "/usr/local/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
api-1      |     result = await app(  # type: ignore[func-returns-value]
api-1      |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api-1      |   File "/usr/local/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
api-1      |     return await self.app(scope, receive, send)
api-1      |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api-1      |   File "/usr/local/lib/python3.12/site-packages/fastapi/applications.py", line 1054, in __call__
api-1      |     await super().__call__(scope, receive, send)
api-1      |   File "/usr/local/lib/python3.12/site-packages/starlette/applications.py", line 123, in __call__
api-1      |     await self.middleware_stack(scope, receive, send)
api-1      |   File "/usr/local/lib/python3.12/site-packages/starlette/middleware/errors.py", line 186, in __call__
api-1      |     raise exc
api-1      |   File "/usr/local/lib/python3.12/site-packages/starlette/middleware/errors.py", line 164, in __call__
api-1      |     await self.app(scope, receive, _send)
api-1      |   File "/usr/local/lib/python3.12/site-packages/starlette/middleware/cors.py", line 85, in __call__
api-1      |     await self.app(scope, receive, send)
api-1      |   File "/usr/local/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
api-1      |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
api-1      |   File "/usr/local/lib/python3.12/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
api-1      |     raise exc
api-1      |   File "/usr/local/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
api-1      |     await app(scope, receive, sender)
api-1      |   File "/usr/local/lib/python3.12/site-packages/starlette/routing.py", line 754, in __call__
api-1      |     await self.middleware_stack(scope, receive, send)
api-1      |   File "/usr/local/lib/python3.12/site-packages/starlette/routing.py", line 774, in app
api-1      |     await route.handle(scope, receive, send)
api-1      |   File "/usr/local/lib/python3.12/site-packages/starlette/routing.py", line 295, in handle
api-1      |     await self.app(scope, receive, send)
api-1      |   File "/usr/local/lib/python3.12/site-packages/starlette/routing.py", line 77, in app
api-1      |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
api-1      |   File "/usr/local/lib/python3.12/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
api-1      |     raise exc
api-1      |   File "/usr/local/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
api-1      |     await app(scope, receive, sender)
api-1      |   File "/usr/local/lib/python3.12/site-packages/starlette/routing.py", line 74, in app
api-1      |     response = await f(request)
api-1      |                ^^^^^^^^^^^^^^^^
api-1      |   File "/usr/local/lib/python3.12/site-packages/fastapi/routing.py", line 278, in app
api-1      |     raw_response = await run_endpoint_function(
api-1      |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api-1      |   File "/usr/local/lib/python3.12/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
api-1      |     return await dependant.call(**values)
api-1      |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api-1      |   File "/app/src/routers/upload.py", line 80, in upload_geotiff
api-1      |     user = verify_token(token)
api-1      |            ^^^^^^^^^^^^^^^^^^^
api-1      |   File "/app/src/supabase.py", line 50, in verify_token
api-1      |     response = client.auth.get_user(jwt)
api-1      |                ^^^^^^^^^^^^^^^^^^^^^^^^^
api-1      |   File "/usr/local/lib/python3.12/site-packages/gotrue/_sync/gotrue_client.py", line 580, in get_user
api-1      |     return self._request("GET", "user", jwt=jwt, xform=parse_user_response)
api-1      |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api-1      |   File "/usr/local/lib/python3.12/site-packages/gotrue/_sync/gotrue_base_api.py", line 123, in _request
api-1      |     raise handle_exception(e)
api-1      | gotrue.errors.AuthApiError: invalid JWT: unable to parse or verify signature, token has invalid claims: token is expired

Any way to extend the lifetime of the token? Could not immediately find that.

My quick fix is to get a new token before each request...

mmaelicke commented 3 months ago

Not easily. Die supabase Python Lib ist ziemlicher Mist und crasht, wenn man autorefresh probiert. Immer neu einloggen ist gut. Du kannst auch einen Timer auf 50min setzen

cmosig commented 3 months ago

Apart from this hickup, the processing is running fine. 125 done, 1150 to go. ETA in 30h.

cmosig commented 2 months ago

gotrue.errors.AuthApiError: Request rate limit reached

.... how can I circumvent this?

mmaelicke commented 2 months ago

Not possible. Only Option is to decrease the numbers of Logins from per request to ie every 50 minutes.

cmosig commented 2 months ago

Oh man ok. Is there a concrete number for the rate-limit?

mmaelicke commented 2 months ago

I think it's 3 or 4 per hour. We can only customize if we use our own smtp server

cmosig commented 2 months ago

thanks. smpt? how to does e-mail play role here?

mmaelicke commented 2 months ago

The whole supabase auth provider is one service. You can only customize any settings if you provide your own smtp server. Many parts of the service rely on mail ie for login, 2fa, otp, reset etc

cmosig commented 2 months ago

The COG generation process it stuck at one specific image and has been at 100% for 1h30min already . Other tifs of similar size are processed just fine within minutes.

Here is the tif: https://cloud.scadsai.uni-leipzig.de/index.php/s/e2ZapJy72PoH22Q

EDIT: with 100% I meant the CPU util

JesJehle commented 2 months ago

I think it would make sense to move some of the data to the file server. If we did this in batches, we could check for possible errors. I think this is better than waiting until all the files have been processed and finding out that some of them are broken.

@cmosig @mmaelicke what do you think?

cmosig commented 2 months ago

You mean the cogs to check if the visualization works?

cmosig commented 2 months ago

I'd close this issue for now and open new more targeted issues if there are any.