Closed dkarmon closed 5 years ago
sure @dkarmon let me look into that and get back to you.
Hi @dkarmon, Please ignore the single method in convert_tf_checkpoint_to_pytorch notebook.
I was able to convert these weight successfully for pytorch. It seems to an issue with writing weights to file.
@MeRajat apologies for that. I continue to investigate the problem and it seems to be at the bert_config.json
file and not in the weights file.
Are you able to run the following command without any errors?
tmp_d = torch.load(parameters.BERT_CONFIG_FILE, map_location='cpu')
I think I found the problem
tmp_d = torch.load(parameters.BERT_CONFIG_FILE, map_location='cpu')
is supposed to get the converted weight file and not the config file. just replace parameters.BERT_CONFIG_FILE
to parameters.BERT_WEIGHTS
and it should work.
Also, note that the label set in data_load.py
is missing a few labeling options ('S-Chemical', 'S-Disease', 'E-Disease', 'E-Chemical'
), which makes the new_train.py
file fail:
class HParams:
def __init__(self, vocab_type):
self.VOCAB_DICT = {
'bc5cdr': ('<PAD>', 'B-Chemical', 'O', 'B-Disease', 'I-Disease', 'I-Chemical', 'S-Chemical', 'S-Disease',
'E-Disease', "E-Chemical"),
'bionlp3g': ('<PAD>', 'B-Amino_acid', 'B-Anatomical_system', 'B-Cancer', 'B-Cell',
'B-Cellular_component', 'B-Developing_anatomical_structure', 'B-Gene_or_gene_product',
'B-Immaterial_anatomical_entity', 'B-Multi-tissue_structure', 'B-Organ', 'B-Organism',
'B-Organism_subdivision', 'B-Organism_substance', 'B-Pathological_formation',
'B-Simple_chemical', 'B-Tissue', 'I-Amino_acid', 'I-Anatomical_system', 'I-Cancer',
'I-Cell', 'I-Cellular_component', 'I-Developing_anatomical_structure',
'I-Gene_or_gene_product',
'I-Immaterial_anatomical_entity', 'I-Multi-tissue_structure', 'I-Organ', 'I-Organism',
'I-Organism_subdivision', 'I-Organism_substance', 'I-Pathological_formation',
'I-Simple_chemical',
'I-Tissue', 'O')
@dkarmon, in my case i didn't used E-Disease tags , that's why it is missing it from there. 👍
Hi @MeRajat, First, thanks for the great repository. I followed the Preparation instructions mentioned in the README file and converted the biobert weight file (specifically Pre-trained weight of BioBERT(Wiki+Books+PubMed+PMC) to be pytorch compatible using the script.
I keep getting the following error while trying to train the model on any valid dataset:
It seems that both versions of the converted weight file are not serialized correctly. Please advise