Closed danishnazir closed 1 year ago
What is the version and commit hash for your local compressAI repository?
Thanks for your response, can you please tell me how can i find commit hash for local compressAI repo? I installed compressAI using pip install compressai.
The library version is 1.2.2
, not sure about the commit hash, where to find.
Can you please show us the output of:
COMPRESSAI_PATH="$(python -c 'import compressai; print(compressai.__path__[0])')"
echo "$COMPRESSAI_PATH"
cd "$COMPRESSAI_PATH"
git rev-parse HEAD
It sounds like you installed compressai
from PyPI, so that means my recent commits https://github.com/InterDigitalInc/CompressAI/commit/b64b0daf0a62a6dc38eb8768fcada074ce19f6a8 and https://github.com/InterDigitalInc/CompressAI/commit/14ac02c5182cbfee596abdfea98886be6247479a are probably not the cause of the problem. The issue is that module.entropy_bottleneck
buffers are not being pre-allocated with enough space since it's expecting entropy_bottleneck
directly. Good news: the recent commits might actually fix the problem! Consider installing compressai from source instead:
cd ~
git clone https://github.com/InterDigitalInc/CompressAI compressai
cd compressai
pip install -U pip && pip install -e .
Alternatively, you can also just copy paste the new load_state_dict
function into CompressionModel
, defined here:
Hi,
Thank you for your detailed answer.
Yes you are right, I am not building compressAI from the source. The requested output is as follows:
COMPRESSAI_PATH = /anaconda/envs/azureml_py38/lib/python3.8/site-packages/compressai
.
As for the proposed solution. My CompressionModel
class already looks the same as you have mentioned. I copied it earlier, since there was some issues with Multi-GPU training and copying it worked for me.
Please look at my project over here Entropy Models/ Hyperprior Files
I think the issue arises from using multiple versions at one time? I use Pypi to install compressai, but I redefine the files e.g. entropy_models.py
again in the code, which might be different from the original pypi version. Could this be a problem?
DataParallel
adds a module.
prefix by default to every key in the parallel_model.state_dict()
.
Solutions:
1) Save the "non-parallel" model:
module = model.module if isinstance(model, DataParallel) else model
state_dict = module.state_dict()
torch.save("output.pth", state_dict)
2) Load checkpoint, rename all the keys, save new checkpoint:
ckpt = torch.load("input.pth")
print(ckpt.keys())
sd = "state_dict" # I forgot what it was called.
print("\n".join(ckpt[sd].keys()))
ckpt[sd] = {k.removeprefix("module."): v for k, v in ckpt[sd].items()}
torch.save("output.pth", ckpt)
3) Same as (2), but do it before loading the state_dict instead.
4) Load the model weights before wrapping it in DataParallel
.
I would say (1) is the best and least likely to cause problems in the future, and maybe do (4) as well.
Yeah you were right. Everything works now. I am attaching my code. in case if someone else face a similar problem.
def load_checkpoint(path, model):
snapshot = torch.load(path)
itr = snapshot['itr']
dict_ = {}
print(f'Loaded from {itr} iterations')
for k, v in snapshot["model"].items():
k = remove_prefix(k,"module.")
dict_[k] = v
snapshot["model"] = dict_
model.load_state_dict(snapshot['model'])`
and in train.py
, we have
model = model.to(device)
optimizer,aux_optimizer = configure_optimizers(model,config)
if args.resume:
itr, model = load_checkpoint(args.resume, model)
logger.load_itr(itr)
if torch.cuda.device_count() > 1:
model = CustomDataParallel(model)
Bug
Hi, I am trying to resume training on a pretrained model (https://github.com/micmic123/QmapCompression), which is based on compressAI. The pretrained model is based on Hyperprior architecture, with some additions.
To Reproduce
Expected behavior
should be easily load the model
Environment
Please copy and paste the output from
python3 -m torch.utils.collect_env