Open davesteps opened 5 days ago
Also seeing this on python3.9.20, darwin, super-gradients version 3.7.1 with other models.
The site could be down. Going to https://sghub.deci.ai/models/yolox_s_coco.pth
directly yields:
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>NotFound</Code>
<Message>The resource you requested does not exist</Message>
...
</Error>
@davesteps @ksil check the latest commit it changed the download urls buit there is no release available What i did was download the latest release and use it as local library but i replaced the urls
I can confirm the same issue
from super_gradients.training import models
yolo_nas_l = models.get("yolo_nas_l", pretrained_weights="coco")
Downloading: "https://sghub.deci.ai/models/yolo_nas_l_coco.pth" to /root/.cache/torch/hub/checkpoints/yolo_nas_l_coco.pth
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
[<ipython-input-6-74936ab117f4>](https://localhost:8080/#) in <cell line: 2>()
1 from super_gradients.training import models
----> 2 yolo_nas_l = models.get("yolo_nas_l", pretrained_weights="coco")
[/usr/lib/python3.10/urllib/request.py](https://localhost:8080/#) in http_error_default(self, req, fp, code, msg, hdrs)
641 class HTTPDefaultErrorHandler(BaseHandler):
642 def http_error_default(self, req, fp, code, msg, hdrs):
--> 643 raise HTTPError(req.full_url, code, msg, hdrs, fp)
644
645 class HTTPRedirectHandler(BaseHandler):
HTTPError: HTTP Error 404: Not Found
Yes old S3 bucket was taken down. For not the workaround is to pip install from the master:
pip install -U git+https://github.com/Deci-AI/super-gradients@e0ccacf8868ffa1296fa4f8407c03d2bc227312c
Sorry for the inconvenience.
This doesn't work @BloodAxe , as there's a step in checkpoint_utils.py that splits the url using the old location. That also needs to be updated to use the new base url: "https://sg-hub-nv.s3.amazonaws.com/"
[/usr/local/lib/python3.10/dist-packages/super_gradients/training/utils/checkpoint_utils.py](https://localhost:8080/#) in load_pretrained_weights(model, architecture, pretrained_weights)
1590 pretrained_state_dict = torch.load(url.replace("file://", ""), map_location="cpu")
1591 else:
-> 1592 unique_filename = url.split("https://sghub.deci.ai/models/")[1].replace("/", "_").replace(" ", "_")
1593 map_location = torch.device("cpu")
1594 with wait_for_the_master(get_local_rank()):
IndexError: list index out of range
Yah experienced the same error here:
IndexError Traceback (most recent call last)
3 frames /usr/local/lib/python3.10/dist-packages/super_gradients/training/utils/checkpoint_utils.py in load_pretrained_weights(model, architecture, pretrained_weights) 1590 pretrained_state_dict = torch.load(url.replace("file://", ""), map_location="cpu") 1591 else: -> 1592 uniquefilename = url.split("https://sghub.deci.ai/models/")[1].replace("/", "").replace(" ", "_") 1593 map_location = torch.device("cpu") 1594 with wait_for_the_master(get_local_rank()):
IndexError: list index out of range
This doesn't work @BloodAxe , as there's a step in checkpoint_utils.py that splits the url using the old location. That also needs to be updated to use the new base url: "https://sg-hub-nv.s3.amazonaws.com/"
[/usr/local/lib/python3.10/dist-packages/super_gradients/training/utils/checkpoint_utils.py](https://localhost:8080/#) in load_pretrained_weights(model, architecture, pretrained_weights) 1590 pretrained_state_dict = torch.load(url.replace("file://", ""), map_location="cpu") 1591 else: -> 1592 unique_filename = url.split("https://sghub.deci.ai/models/")[1].replace("/", "_").replace(" ", "_") 1593 map_location = torch.device("cpu") 1594 with wait_for_the_master(get_local_rank()): IndexError: list index out of range
Maybe you can try to replace the url with the updated one in your virtual environment. I tried it and at least I did not see any errors after I replaced all occurrences. It's not ideal, but probably it'll get you moving.
This doesn't work @BloodAxe , as there's a step in checkpoint_utils.py that splits the url using the old location. That also needs to be updated to use the new base url: "https://sg-hub-nv.s3.amazonaws.com/"
[/usr/local/lib/python3.10/dist-packages/super_gradients/training/utils/checkpoint_utils.py](https://localhost:8080/#) in load_pretrained_weights(model, architecture, pretrained_weights) 1590 pretrained_state_dict = torch.load(url.replace("file://", ""), map_location="cpu") 1591 else: -> 1592 unique_filename = url.split("https://sghub.deci.ai/models/")[1].replace("/", "_").replace(" ", "_") 1593 map_location = torch.device("cpu") 1594 with wait_for_the_master(get_local_rank()): IndexError: list index out of range
Maybe you can try to replace the url with the updated one in your virtual environment. I tried it and at least I did not see any errors after I replaced all occurrences. It's not ideal, but probably it'll get you moving.
Thanks for following up. Your recommendation is what I ended up doing. My followup post was to document it for others.
Created a pr to fix this https://github.com/Deci-AI/super-gradients/pull/2061 @BloodAxe @ofrimasad @shaydeci
I have solved the problem with the following modification.
Open pretrained_models.py
vi /usr/local/lib/python3.10/dist-packages/super_gradients/training/pretrained_models.py
Replaced part of the URL.
:%s/sghub.deci.ai/sg-hub-nv.s3.amazonaws.com/g
Open checkpoint_utils.py
vi /usr/local/lib/python3.10/dist-packages/super_gradients/training/utils/checkpoint_utils.py
Corrected line 1592. [before]
unique_filename = url.split("https://sghub.deci.ai/models/")[1].replace("/", "_").replace(" ", "_")
[after]
unique_filename = url.split("https://sg-hub-nv.s3.amazonaws.com/models/")[1].replace("/", "_").replace(" ", "_")
Thanks to everyone who commented above for their help in resolving this issue.
đ Describe the bug
Versions