bhavaygg commented 3 years ago

Describe the bug

ModuleNotFoundError: No module named 'biome.text'; 'biome' is not a package Tried install from source and from pip but both give same error

To Reproduce

File "biome.py", line 53, in <module>
    from biome.text import Pipeline
  File "D:\desktop2.0\prot\biome.py", line 53, in <module>
    from biome.text import Pipeline
ModuleNotFoundError: No module named 'biome.text'; 'biome' is not a package

OS environment

OS: Windows 10
- biome.text Version [e.g. 1.0.0]

Additional context

biome --help works on cmd in both cases and python version is 3.7

dcfidalgo commented 3 years ago

Thanks for reporting!

Honestly, we do not have much experience with Windows systems, but could you try to simply rename your biome.py script (to my_biome.py for example). Maybe it is just the namespaces.

bhavaygg commented 3 years ago

Thanks that fixed it. But i am running into another error.

df=pd.read_csv("bert_train.csv")
df_train, df_test = train_test_split(df, test_size=0.1, random_state=RANDOM_SEED)
df_val, df_test = train_test_split(df_test, test_size=0.5, random_state=RANDOM_SEED)

pipeline_dict = {
    "name": "prot",
    "tokenizer": {
        "text_cleaning": {
            "rules": ["strip_spaces"]
        }
    },
    "features": {
        "word": {
            "embedding_dim": 64,
            "lowercase_tokens": True,
        },
        "char": {
            "embedding_dim": 32,
            "lowercase_characters": True,
            "encoder": {
                "type": "gru",
                "num_layers": 1,
                "hidden_size": 32,
                "bidirectional": True,
            },
            "dropout": 0.1,
        },
    },
    "head": {
        "type": "TextClassification",
        "labels": ["0","1"],
        "pooler": {
            "type": "gru",
            "num_layers": 1,
            "hidden_size": 32,
            "bidirectional": True,
        },
        "feedforward": {
            "num_layers": 1,
            "hidden_dims": [32],
            "activations": ["relu"],
            "dropout": [0.0],
        },
    },       
}

from biome.text import Pipeline
pl = Pipeline.from_config(pipeline_dict)
from biome.text.configuration import VocabularyConfiguration, WordFeatures
print(df_train)
vocab_config = VocabularyConfiguration(sources=[df_train], min_count={WordFeatures.namespace: 1000})
pl.create_vocabulary(vocab_config)

My dataframe looks like this

                                                  text                                         label
371  MKK KKH KHH HHH HHH HHH HHH HHL HLV LVP VPR PR...      1
257  GSH SHM HMG MGS GSP SPN PNS NSP SPL PLK LKD KD...         1

and the stack trace is

File "biomeee.py", line 58, in <module>
    pl.create_vocabulary(vocab_config)
  File "D:\Anaconda\envs\myenv\lib\site-packages\biome\text\pipeline.py", line 750, in create_vocabulary
    vocab = self._extend_vocabulary(vocabulary.create_empty_vocabulary(), config)
  File "D:\Anaconda\envs\myenv\lib\site-packages\biome\text\pipeline.py", line 690, in _extend_vocabulary
    instances_vocab = Vocabulary.from_instances(
  File "D:\Anaconda\envs\myenv\lib\site-packages\allennlp\data\vocabulary.py", line 292, in from_instances
    instance.count_vocab_items(namespace_token_counts)
AttributeError: 'str' object has no attribute 'count_vocab_items'

dcfidalgo commented 3 years ago

We do not support working with pandas DataFrames directly, but you can always create a Dataset from a DataFrame:

from biome.text import Dataset
train_ds = Dataset.from_pandas(df_train)

vocab_config = VocabularyConfiguration(sources=[train_ds], min_count={WordFeatures.namespace: 1000})
pl.create_vocabulary(vocab_config)

I assume you installed biome.text from master, which we recommend at the moment (until the new release):

pip install -U git+https://github.com/recognai/biome-text.git

Let me know if i can be of any further help!

dvsrepo commented 3 years ago

Closing this, @Chokerino feel free to create another issue if you have more questions

argilla-io / biome-text

[BUG] ModuleNotFoundError: No module named 'biome.text'; 'biome' is not a package #463

Describe the bug

To Reproduce

OS environment

Additional context