euagendas / m3inference

A deep learning system for demographic inference (gender, age, and individual/person) that was trained on massive Twitter dataset using profile images, screen names, names, and biographies
http://www.euagendas.org
GNU Affero General Public License v3.0
145 stars 57 forks source link

Error with using infer_id() #10

Open anudeike opened 3 years ago

anudeike commented 3 years ago

Hi! I'm using this code for a research project, thank you for providing it.

I am trying to make an inference based infer_id nd I just replicated the example in the FAQ. Here's what my code looks like:

from m3inference import M3Twitter load_dotenv() # authentication twitter_app_auth = { 'consumer_key': os.getenv('TWITTER_API_KEY'), 'consumer_secret': os.getenv('TWITTER_API_SECRET'), 'access_token': os.getenv('TWITTER_ACCESS_TOKEN'), 'access_token_secret': os.getenv('TWITTER_ACCESS_SECRET'), }

# init the api inferenceTwitter.twitter_init(api_key=twitter_app_auth['consumer_key'], api_secret=twitter_app_auth['consumer_secret'], access_token=twitter_app_auth['access_token'], access_secret=twitter_app_auth['access_token_secret'])

pprint.pprint(inferenceTwitter.infer_id("2631881902"))

The traceback that I received was pretty confusing

`RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.`

RuntimeError: DataLoader worker (pid(s) 57016) exited unexpectedly

I'm not sure where to find the freeze_support() function call and how to deal with using the fork() child processes.

zijwang commented 3 years ago

I just tried your example and it seems the code works on my end.

>>> from m3inference import M3Twitter
>>> m3twitter = M3Twitter()
09/25/2020 12:09:58 - INFO - m3inference.m3inference -   Version 1.1.0
09/25/2020 12:09:58 - INFO - m3inference.m3inference -   Running on cpu.
09/25/2020 12:09:58 - INFO - m3inference.m3inference -   Will use full M3 model.
09/25/2020 12:09:59 - INFO - m3inference.m3inference -   Model full_model exists at <...>/m3/models/full_model.mdl.
09/25/2020 12:09:59 - INFO - m3inference.utils -   Checking MD5 for model full_model at <...>/m3/models/full_model.mdl
09/25/2020 12:09:59 - INFO - m3inference.utils -   MD5s match.
09/25/2020 12:10:01 - INFO - m3inference.m3inference -   Loaded pretrained weight at <...>/m3/models/full_model.mdl
09/25/2020 12:10:01 - INFO - m3inference.m3twitter -   Dir <...>/m3/cache does not exist. Creating now.
09/25/2020 12:10:01 - INFO - m3inference.m3twitter -   Dir <...>/m3/cache created.
>>> m3twitter.twitter_init_from_file('<...>/auth_example.txt')
True
>>> m3twitter.infer_id("2631881902")
09/25/2020 12:10:08 - INFO - m3inference.m3twitter -   Results not in cache. Fetching data from Twitter for id 2631881902.
09/25/2020 12:10:08 - INFO - m3inference.m3twitter -   GET /users/show.json?id=2631881902
09/25/2020 12:10:11 - INFO - m3inference.dataset -   1 data entries loaded.
Predicting...: 100%|██████████████████████████████████████████████| 1/1 [00:07<00:00,  7.60s/it]
{'input': {'description': 'Bundeskanzlerin', 'id': '2631881902', 'img_path': '<...>2631881902_224x224.jpg', 'lang': 'de', 'name': 'Angela Merkel', 'screen_name': 'AngelaMerkeICDU'}, 'output': {'gender': {'male': 0.0015, 'female': 0.9985}, 'age': {'<=18': 0.0, '19-29': 0.0, '30-39': 0.0001, '>=40': 0.9999}, 'org': {'non-org': 0.996, 'is-org': 0.004}}}

Could you provide the full log and traceback? If you are using a GPU could you also try to run the code on CPU only?

computermacgyver commented 3 years ago

Thanks, @anudeike , for reporting this. Could you let us know what operating system and version of Python you are using?

My guess @zijwang is that this may be a Windows-specific bug. The error mentions fork (linux/mac only) and freeze_support, which I believe is something specific about making multiprocess approaches work on Windows. I've just not run anything but Linux for so long that I'm unsure.

anudeike commented 3 years ago

@computermacgyver Hi and you're welcome.

I am using Windows 10 and Python 3.8.

computermacgyver commented 3 years ago

Hi @anudeike , I've done a bit of digging on this and think this is down to how you call the library in your script. Can you try this example: https://github.com/euagendas/m3inference/blob/win/scripts/m3twitter.py

If you follow the instructions in the README to create a file called auth.txt based on the structure of auth_example.txt in that same directory, then you should be able to run

python m3twitter.py --auth auth.txt --screen-name computermacgyve --skip-cache
10/27/2020 19:22:55 - INFO - m3inference.m3inference -   Version 1.1.1
10/27/2020 19:22:55 - INFO - m3inference.m3inference -   Running on cpu.
10/27/2020 19:22:55 - INFO - m3inference.m3inference -   Will use full M3 model.
10/27/2020 19:22:56 - INFO - m3inference.m3inference -   Model full_model exists at /home/shale/m3/models/full_model.mdl.
10/27/2020 19:22:56 - INFO - m3inference.utils -   Checking MD5 for model full_model at /home/shale/m3/models/full_model.mdl
10/27/2020 19:22:56 - INFO - m3inference.utils -   MD5s match.
10/27/2020 19:22:56 - INFO - m3inference.m3inference -   Loaded pretrained weight at /home/shale/m3/models/full_model.mdl
10/27/2020 19:22:56 - INFO - m3inference.m3twitter -   skip_cache is True. Fetching data from Twitter for computermacgyve.
10/27/2020 19:22:56 - INFO - m3inference.m3twitter -   GET /users/show.json?screen_name=computermacgyve
10/27/2020 19:23:02 - INFO - m3inference.dataset -   1 data entries loaded.
Predicting...: 100%|██████████████████████████████| 1/1 [00:00<00:00,  2.30it/s]
{'input': {'description': 'Sr Research Fellow @oiioxford, Director of Research '
                          '@meedan, Fellow @turinginst.・widening access to '
                          'quality '
                          'info・multilingualism・mobilization・NLP・agenda '
                          'setting',
           'id': '19854920',
           'img_path': '/home/shale/m3/cache/19854920_224x224.jpg',
           'lang': 'en',
           'name': 'Scott Hale',
           'screen_name': 'computermacgyve'},
 'output': {'age': {'19-29': 0.0117,
                    '30-39': 0.1219,
                    '<=18': 0.0014,
                    '>=40': 0.865},
            'gender': {'female': 0.0003, 'male': 0.9997},
            'org': {'is-org': 0.0002, 'non-org': 0.9998}}}

What you'll see in that file is that

  1. I make sure the contents of my main program are within a
    if __name__ == "__main__":

    block

  2. The first statement of that block is freeze_support()
  3. I have imported that freeze_support function from multiprocessing, i.e.,
    from multiprocessing import freeze_support