请问，离线运行，配置文件需要修改哪些地方？需要下载哪些文件？

xiaowenhe commented 1 year ago

请问，离线运行，配置文件需要修改哪些地方？需要下载config.json等文件，放在哪？在没有联网的服务器上运行，报 OSError： We couldn't connect to https://huggingface.co to load this file, couldn't find it in che cached files and it looks like bert-base-uncased is not the path to a directory containing a file named config.json.

谢谢！

SlongLiu commented 1 year ago

您看一下代码里的接口，代码可以直接指定config和预训练模型路径，手动下载并放到合适路径即可。

xiaowenhe commented 1 year ago

不是很清楚这个整体的流程。可以明确下代码如何直接指定config和预训练模型路径吗？谢谢！

SlongLiu commented 1 year ago

以grounded_sam_demo.py为例，使用--grounded_checkpoint 控制预训练模型路径，--config控制config路径

xiaowenhe commented 1 year ago

不好意思，可能没表达清楚，不是如何指定config运行代码。我遇到的问题是，在运行代码的中间过程中，GroundingDINO/groundingdino/util/get_tokenlizer.py 第17行，tokenizer = AutoTokenizer.from_pretrained(text_encoder_type)，从这个地方再往下运行，需要联网下载一些文件，也就是我一开始问的（报错提示需要下载config.json等文件，放在哪？在没有联网的服务器上运行，报 OSError： We couldn't connect to https://huggingface.co to load this file, couldn't find it in che cached files and it looks like bert-base-uncased is not the path to a directory containing a file named config.json.）谢谢！

Andy1621 commented 1 year ago

You can first run the hugginface-related code to download the packages, which often likes xx. from_pretrained()

LeanFly commented 1 year ago

不好意思，可能没表达清楚，不是如何指定config运行代码。我遇到的问题是，在运行代码的中间过程中，GroundingDINO/groundingdino/util/get_tokenlizer.py 第17行，tokenizer = AutoTokenizer.from_pretrained(text_encoder_type)，从这个地方再往下运行，需要联网下载一些文件，也就是我一开始问的（报错提示需要下载config.json等文件，放在哪？在没有联网的服务器上运行，报 OSError： We couldn't connect to https://huggingface.co to load this file, couldn't find it in che cached files and it looks like bert-base-uncased is not the path to a directory containing a file named config.json.）谢谢！

确实，我也发现了，由于每次请求都要请求远程的huggingface容易出现连接失败的错误。我下载了需要的模型文件，还没搞清楚要怎么改成加载本地的文件。

Andy1621 commented 1 year ago

It will load the downloaded model automatically after you download it.

Zalberth commented 1 year ago

Maybe you need this link to download the whole model by installing huggingface_hub, then replace the input param in from_pretrained with your local model directory.

cxliu0 commented 1 year ago

Here is my workaround to run the model without connecting to huggingface:

Step 1: download necessary files listed in huggingface-bert-base-uncased, including config.json, flax_model.msgpack, pytorch_model.bin, tf_model.h5, tokenizer.json, tokenizer_config.json, vocab.txt
Step 2: put downloaded files (Step 1) into your local folder. For example, the local folder could be Grounded-Segment-Anything/huggingface/bert-base-uncased
Step 3: modify text_encoder_type in get_tokenlizer.py#L17 and get_tokenlizer.py#L23 to your local folder (defined in Step 2)
Step 4: run the model and enjoy it

levylll commented 1 year ago

Here is my workaround to run the model without connecting to huggingface:

Step 1: download necessary files listed in huggingface-bert-base-uncased, including config.json, flax_model.msgpack, pytorch_model.bin, tf_model.h5, tokenizer.json, tokenizer_config.json, vocab.txt

Step 2: put downloaded files (Step 1) into your local folder. For example, the local folder could be Grounded-Segment-Anything/huggingface/bert-base-uncased

Step 3: modify text_encoder_type in get_tokenlizer.py#L17 and get_tokenlizer.py#L23 to your local folder (defined in Step 2)

Step 4: run the model and enjoy it

好麻烦。。。主要是最后这个要改源代码。。。。。那后面怎么保持和master的同步（虽然后面改动的可能性不大

levylll commented 1 year ago

@SlongLiu 小哥，这里可以加一下这个配置么？给一个初始化时透传目录的机会

LeanFly commented 1 year ago

定位 load_model_hf 这个方法，写一个自己喜欢的调试语句，打印一下 cache_file 在本地的路径，将模型文件拷贝到某个地方，然后注释掉 cache_file 这一行，将 cache_file 指定到本地路径

rentainhe commented 1 year ago

Here is my workaround to run the model without connecting to huggingface:

Step 1: download necessary files listed in huggingface-bert-base-uncased, including config.json, flax_model.msgpack, pytorch_model.bin, tf_model.h5, tokenizer.json, tokenizer_config.json, vocab.txt

Step 2: put downloaded files (Step 1) into your local folder. For example, the local folder could be Grounded-Segment-Anything/huggingface/bert-base-uncased

Step 3: modify text_encoder_type in get_tokenlizer.py#L17 and get_tokenlizer.py#L23 to your local folder (defined in Step 2)

Step 4: run the model and enjoy it

We will highlight it in our issue! Thanks for your solution, we will refine the code in the future release

zzh805780186 commented 1 year ago

Here is my workaround to run the model without connecting to huggingface:

Step 1: download necessary files listed in huggingface-bert-base-uncased, including config.json, flax_model.msgpack, pytorch_model.bin, tf_model.h5, tokenizer.json, tokenizer_config.json, vocab.txt

Step 2: put downloaded files (Step 1) into your local folder. For example, the local folder could be Grounded-Segment-Anything/huggingface/bert-base-uncased

Step 3: modify text_encoder_type in get_tokenlizer.py#L17 and get_tokenlizer.py#L23 to your local folder (defined in Step 2)

Step 4: run the model and enjoy it

good job! Thanks

nomoneyExpection commented 1 year ago

Here is my modified code： from transformers import AutoTokenizer, BertModel, RobertaModel, RobertaTokenizerFast, BertTokenizer

def get_tokenlizer(text_encoder_type): if not isinstance(text_encoder_type, str): if hasattr(text_encoder_type, "text_encoder_type"): text_encoder_type = text_encoder_type.text_encoder_type elif text_encoder_type.get("text_encoder_type", False): text_encoder_type = text_encoder_type.get("text_encoder_type") else: raise ValueError( "Unknown type of text_encoder_type: {}".format(type(text_encoder_type)) ) print("final text_encoder_type: {}".format(text_encoder_type))

tokenizer_path = "Grounded-Segment-Anything/huggingface/bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(tokenizer_path, use_fast=False)
return tokenizer

def get_pretrained_language_model(text_encoder_type): if text_encoder_type == "bert-base-uncased": model_path = "Grounded-Segment-Anything/huggingface/bert-base-uncased/pytorch_model.bin" return BertModel.from_pretrained(model_path) if text_encoder_type == "roberta-base": return RobertaModel.from_pretrained(text_encoder_type) raise ValueError("Unknown text_encoder_type {}".format(text_encoder_type))

But I still get an error： (gsa) D:\forwork\Grounded-Segment-Anything>python grounded_sam_demo.py --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py --grounded_checkpoint groundingdino_swint_ogc.pth --sam_checkpoint sam_vit_h_4b8939.pth --input_image assets/demo1.jpg --output_dir "outputs" --box_threshold 0.3 --text_threshold 0.25 --text_prompt "bear" --device "cuda" D:\Anaconda3\envs\gsa\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:3191.) return _VF.meshgrid(tensors, *kwargs) # type: ignore[attr-defined] final text_encoder_type: bert-base-uncased Traceback (most recent call last): File "grounded_sam_demo.py", line 181, in model = load_model(config_file, grounded_checkpoint, device=device) File "grounded_sam_demo.py", line 46, in load_model model = build_model(args) File "D:\forwork\Grounded-Segment-Anything\GroundingDINO\groundingdino\models__init__.py", line 17, in build_model model = build_func(args) File "D:\forwork\Grounded-Segment-Anything\GroundingDINO\groundingdino\models\GroundingDINO\groundingdino.py", line 372, in build_groundingdino model = GroundingDINO( File "D:\forwork\Grounded-Segment-Anything\GroundingDINO\groundingdino\models\GroundingDINO\groundingdino.py", line 107, in init self.tokenizer = get_tokenlizer.get_tokenlizer(text_encoder_type) File "d:\forwork\grounded-segment-anything\groundingdino\groundingdino\util\get_tokenlizer.py", line 45, in get_tokenlizer tokenizer = BertTokenizer.from_pretrained(tokenizer_path, use_fast=False) File "D:\Anaconda3\envs\gsa\lib\site-packages\transformers\tokenization_utils_base.py", line 1654, in from_pretrained fast_tokenizer_file = get_fast_tokenizer_file( File "D:\Anaconda3\envs\gsa\lib\site-packages\transformers\tokenization_utils_base.py", line 3486, in get_fast_tokenizer_file all_files = get_list_of_files( File "D:\Anaconda3\envs\gsa\lib\site-packages\transformers\file_utils.py", line 2103, in get_list_of_files return list_repo_files(path_or_repo, revision=revision, token=token) File "D:\Anaconda3\envs\gsa\lib\site-packages\huggingface_hub\utils_deprecation.py", line 103, in inner_f return f(args, **kwargs) File "D:\Anaconda3\envs\gsa\lib\site-packages\huggingface_hub\utils_validators.py", line 110, in _inner_fn validate_repo_id(arg_value) File "D:\Anaconda3\envs\gsa\lib\site-packages\huggingface_hub\utils_validators.py", line 158, in validate_repo_id raise HFValidationError( huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'Grounded-Segment-Anything/huggingface/bert-base-uncased'. Use repo_type argument if needed.

Is there any good solution?

littleanapple commented 11 months ago

export http_proxy="http://192.168.30.127:4780" export https_proxy="http://192.168.30.127:4780"

littleanapple commented 11 months ago

set proxy in docker

liaokaiyao commented 1 month ago

Here is my workaround to run the model without connecting to huggingface:

Step 1: download necessary files listed in huggingface-bert-base-uncased, including config.json, flax_model.msgpack, pytorch_model.bin, tf_model.h5, tokenizer.json, tokenizer_config.json, vocab.txt

Step 2: put downloaded files (Step 1) into your local folder. For example, the local folder could be Grounded-Segment-Anything/huggingface/bert-base-uncased

Step 3: modify text_encoder_type in get_tokenlizer.py#L17 and get_tokenlizer.py#L23 to your local folder (defined in Step 2)

Step 4: run the model and enjoy it

thank you！！！

IDEA-Research / Grounded-Segment-Anything

请问，离线运行，配置文件需要修改哪些地方？需要下载哪些文件？ #75