Open woshiyyya opened 1 year ago
I have just uploaded the ptb dataset on onedrive.
For inference, you may make a file like this (add dummy tags in the 7,8,9-th column) and follow the instruction:
1\tBut\t_\t_\t_\t_\t_\t0\troot\t0:root
2\tI\t_\t_\t_\t_\t_\t0\troot\t0:root
3\tfound\t_\t_\t_\t_\t_\t0\troot\t0:root
4\tthe\t_\t_\t_\t_\t_\t0\troot\t0:root
5\tlocation\t_\t_\t_\t_\t_\t0\troot\t0:root
6\twonderful\t_\t_\t_\t_\t_\t0\troot\t0:root
7\tand\t_\t_\t_\t_\t_\t0\troot\t0:root
7.1\tfound\t_\t_\t_\t_\t_\t0\troot\t0:root
8\tthe\t_\t_\t_\t_\t_\t0\troot\t0:root
9\tneighbors\t_\t_\t_\t_\t_\t0\troot\t0:root
10\tvery\t_\t_\t_\t_\t_\t0\troot\t0:root
11\tkind\t_\t_\t_\t_\t_\t0\troot\t0:root
12\t.\t_\t_\t_\t_\t_\t0\troot\t0:root
Hi Xinyu,
Thanks for uploading the data!
I created a folder named data
and put a train.tsv
file with the demo case you provide.
Run:
CUDA_VISIBLE_DEVICES=0 python train.py --config config/ptb_parsing_model.yaml --parse --target_dir data --keep_order
But still got an error:
2022-09-07 02:59:16,391 Reading data from /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified
2022-09-07 02:59:16,391 Train: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/train_modified.conllu
2022-09-07 02:59:16,391 Test: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/test.conllu
2022-09-07 02:59:16,391 Dev: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/dev.conllu
Traceback (most recent call last):
File "train.py", line 85, in <module>
config = ConfigParser(config,all=args.all,zero_shot=args.zeroshot,other_shot=args.other,predict=args.predict)
File "/projects/clio1/probing/ACE/flair/config_parser.py", line 63, in __init__
self.corpus: ListCorpus=self.get_corpus
File "/projects/clio1/probing/ACE/flair/config_parser.py", line 329, in get_corpus
current_dataset=getattr(datasets,corpus)(tag_to_bioes=self.target)
File "/projects/clio1/probing/ACE/flair/datasets.py", line 360, in __init__
train = UniversalDependenciesDataset(data_folder/'train_modified.conllu', in_memory=in_memory, add_root=True)
File "/projects/clio1/probing/ACE/flair/datasets.py", line 1006, in __init__
assert path_to_conll_file.exists()
AssertionError
Do you know how to fix that?
Have you checked whether the datasets is at the correct place?
I have just uploaded the ptb dataset on onedrive.
For inference, you may make a file like this (add dummy tags in the 7,8,9-th column) and follow the instruction:
1\tBut\t_\t_\t_\t_\t_\t0\troot\t0:root 2\tI\t_\t_\t_\t_\t_\t0\troot\t0:root 3\tfound\t_\t_\t_\t_\t_\t0\troot\t0:root 4\tthe\t_\t_\t_\t_\t_\t0\troot\t0:root 5\tlocation\t_\t_\t_\t_\t_\t0\troot\t0:root 6\twonderful\t_\t_\t_\t_\t_\t0\troot\t0:root 7\tand\t_\t_\t_\t_\t_\t0\troot\t0:root 7.1\tfound\t_\t_\t_\t_\t_\t0\troot\t0:root 8\tthe\t_\t_\t_\t_\t_\t0\troot\t0:root 9\tneighbors\t_\t_\t_\t_\t_\t0\troot\t0:root 10\tvery\t_\t_\t_\t_\t_\t0\troot\t0:root 11\tkind\t_\t_\t_\t_\t_\t0\troot\t0:root 12\t.\t_\t_\t_\t_\t_\t0\troot\t0:root
Hi Xinyu, Is there something wrong with the data format provided? i just find, the code token = Token(fields[1], head_id=int(fields[6])) shows me ValueError: invalid literal for int() with base 10: '_'.
So I guess the 0-th column is token id, the 1-th column is token, the 2,3,4,5-th column is "", the 6-th column is 0, (dummy tags) the 7-th column is "", the 8-th column is "root", (dummy tags) the 9-th column is "0:root", (dummy tags)
is that right?
Hi Xinyu,
Thanks for uploading the data!
I created a folder named
data
and put atrain.tsv
file with the demo case you provide.Run:
CUDA_VISIBLE_DEVICES=0 python train.py --config config/ptb_parsing_model.yaml --parse --target_dir data --keep_order
But still got an error:
2022-09-07 02:59:16,391 Reading data from /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified 2022-09-07 02:59:16,391 Train: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/train_modified.conllu 2022-09-07 02:59:16,391 Test: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/test.conllu 2022-09-07 02:59:16,391 Dev: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/dev.conllu Traceback (most recent call last): File "train.py", line 85, in <module> config = ConfigParser(config,all=args.all,zero_shot=args.zeroshot,other_shot=args.other,predict=args.predict) File "/projects/clio1/probing/ACE/flair/config_parser.py", line 63, in __init__ self.corpus: ListCorpus=self.get_corpus File "/projects/clio1/probing/ACE/flair/config_parser.py", line 329, in get_corpus current_dataset=getattr(datasets,corpus)(tag_to_bioes=self.target) File "/projects/clio1/probing/ACE/flair/datasets.py", line 360, in __init__ train = UniversalDependenciesDataset(data_folder/'train_modified.conllu', in_memory=in_memory, add_root=True) File "/projects/clio1/probing/ACE/flair/datasets.py", line 1006, in __init__ assert path_to_conll_file.exists() AssertionError
Do you know how to fix that?
after I change the data format, I also face the same problem. have you resolved it?
Hi Xinyu, Thanks for uploading the data! I created a folder named
data
and put atrain.tsv
file with the demo case you provide. Run:CUDA_VISIBLE_DEVICES=0 python train.py --config config/ptb_parsing_model.yaml --parse --target_dir data --keep_order
But still got an error:2022-09-07 02:59:16,391 Reading data from /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified 2022-09-07 02:59:16,391 Train: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/train_modified.conllu 2022-09-07 02:59:16,391 Test: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/test.conllu 2022-09-07 02:59:16,391 Dev: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/dev.conllu Traceback (most recent call last): File "train.py", line 85, in <module> config = ConfigParser(config,all=args.all,zero_shot=args.zeroshot,other_shot=args.other,predict=args.predict) File "/projects/clio1/probing/ACE/flair/config_parser.py", line 63, in __init__ self.corpus: ListCorpus=self.get_corpus File "/projects/clio1/probing/ACE/flair/config_parser.py", line 329, in get_corpus current_dataset=getattr(datasets,corpus)(tag_to_bioes=self.target) File "/projects/clio1/probing/ACE/flair/datasets.py", line 360, in __init__ train = UniversalDependenciesDataset(data_folder/'train_modified.conllu', in_memory=in_memory, add_root=True) File "/projects/clio1/probing/ACE/flair/datasets.py", line 1006, in __init__ assert path_to_conll_file.exists() AssertionError
Do you know how to fix that?
after I change the data format, I also face the same problem. have you resolved it?
Have you ensured the path /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/train_modified.conllu
exist? If not, you may download the data above and put them at this path.
yes! I have done it! and I solve this problem, it also needs to have dev/test datasets in the target_dir. But now I can parse the dataset with CPU(very slow), fail to run it with GPU set.
It shows me :
Traceback (most recent call last):
File "train.py", line 378, in
I try to set sequence_output, pooled_output, hidden_states = self.model(input_ids, attention_mask=mask, inputs_embeds = inputs_embeds)
into
sequence_output, pooled_output, hidden_states = self.model(input_ids.cuda(), attention_mask=mask.cuda(), inputs_embeds = inputs_embeds)
it also shows me the same question.
T T,
You may try to uncomment these lines https://github.com/Alibaba-NLP/ACE/blob/7033e91b5428bfbf33c75a4c81f2336f03115ed8/train.py#L226-L238
You may try to uncomment these lines
https://github.com/Alibaba-NLP/ACE/blob/7033e91b5428bfbf33c75a4c81f2336f03115ed8/train.py#L226-L238
hi Xinyu, I have resolved the problem, and applied ACE to my data parsing successfully, thanks for your help.
Hi there!
I am trying to test with your pretrained dependency parsing model. However, I cannot find your processed PTB dataset. Can you share it with a link?
Also, I am wondering how to inference with my own data. For example, how can I feed one sentence and get its tagging result?