AnthonyMRios / pymetamap

Python wraper for MetaMap
170 stars 61 forks source link

TypeError: a bytes-like object is required, not 'str' #5

Closed tianranzhang closed 7 years ago

tianranzhang commented 7 years ago

Hi, I just discovered the Pymetamap package today and I am new to python. I am using this package to analyze clinical trial inclusion criteria retrieved from mysql database in the form of dict object. This is the object I took for experimental analysis: result_set={'criteria': 'Male physicians, ages 40 to 84. No history of stroke, myocardial infarction, cancer, or renal disease. No contraindications to aspirin or beta-carotene. No current usage of aspirin or Vitamin A tables greater than once per week.'}

I first converted the dict object to string: str_json = json.dumps(result_set) When I followed the example usage code and tried to run the line concepts,error = mm.extract_concepts(str_json)

it returns the error:

TypeError: a bytes-like object is required, not 'str'

Then I tried to convert to bytes format by running: data=str.encode(str_json) And checked the type of the newly generated object: type(data)

It shows that data is of type 'bytes' already.

Thus I ran the concept extraction code again: concepts,error = mm.extract_concepts(data)

And it still returns the same error asking for a 'bytes-like object'.

Could you please help me figure out what is wrong here? Is there anything I should look into other than the data type conversion (since I already converted the data type)?

I am currently using Python 3 (Anaconda environment).

Thank you so much!!

Tianran

AnthonyMRios commented 7 years ago

extract concepts should take a list of strings as input. What happens when you try:

concepts,error = mm.extract_concepts([results_set['criteria']])

tianranzhang commented 7 years ago

Thanks for your reply. I tried to use [result_set['criteria']] and it returns the same error:

TypeError: a bytes-like object is required, not 'str'


As the original list of dict objects are stored in a list named 'result_set' I tried to create a new list of strings ('result_string') by fetching the 'criteria' element of each object. I ran the following code: result_string=[] for i in range(0, len(result_set)-1): result_string.append(result_set[i]['criteria']) concepts, error = mm.extract_concepts(results_set)

Again it returns the error:

TypeError: a bytes-like object is required, not 'str'

AnthonyMRios commented 7 years ago

Can you give a complete code example that I can use to try to recreate the error? Also, what OS are you using and what version of MetaMap?

When I try to run the following code here is what I get:

mm = MetaMap.get_instance('/opt/public_mm16/public_mm/bin/metamap16')

result_set={'criteria': 'Male physicians, ages 40 to 84. No history of stroke, myocardial infarction, cancer, or renal disease. No contraindications to aspirin or beta-carotene. No current usage of aspirin or Vitamin A tables greater than once per week.'}

concepts,error = mm.extract_concepts([result_set['criteria']])

Processing 00000000.tx.1: 'Male physicians, ages 40 to 84. No history of stroke, myocardial infarction, cancer, or renal disease. No contraindications to aspirin or beta-carotene. No current usage of aspirin or Vitamin A tables greater than once per week.'

for concept in concepts: print concept

ConceptMMI(index='00000000', mm='MMI', score='30.42', preferred_name='Beta Carotene', cui='C0053396', semtypes='[orch,phsu,vita]', trigger='[".BETA.-CAROTENE"-tx-1-"beta-carotene"-noun-0]', location='TX', pos_info='139/13', tree_codes='D02.455.326.271.665.202.123;D02.455.426.392.368.367.379.249.050;D02.455.849.131.123;D23.767.261.050') ConceptMMI(index='00000000', mm='MMI', score='28.77', preferred_name='Vitamin A', cui='C0042839', semtypes='[orch,phsu,vita]', trigger='["VITAMIN A"-tx-1-"Vitamin A"-noun-0]', location='TX', pos_info='185/9', tree_codes='D02.455.326.271.665.202.495.818;D02.455.426.392.368.367.379.249.700.860;D02.455.849.131.495.818;D23.767.261.700.860;x.x.x.x') ConceptMMI(index='00000000', mm='MMI', score='26.00', preferred_name='N-acetyl-S-(alpha-methyl-4-(2-methylpropyl)benzeneacetyl)cysteine 4-(nitrooxy)butyl ester', cui='C1454756', semtypes='[orch]', trigger='["NO-aspirin"-tx-1-"No aspirin"-noun-0]', location='TX', pos_info='[104/2,128/7],[154/2,174/7]', tree_codes='x.x.x.x') ConceptMMI(index='00000000', mm='MMI', score='16.05', preferred_name='Cerebrovascular accident', cui='C0038454', semtypes='[dsyn]', trigger='["STROKE"-tx-1-"stroke"-noun-1]', location='TX', pos_info='47/6', tree_codes='C10.228.140.300.775;C14.907.253.855') ConceptMMI(index='00000000', mm='MMI', score='14.64', preferred_name='Glycosylation End Products, Advanced', cui='C0162574', semtypes='[bacs,orch]', trigger='["AGEs"-tx-1-"ages"-verb-0]', location='TX', pos_info='18/4', tree_codes='D12.776.643.500') ConceptMMI(index='00000000', mm='MMI', score='14.64', preferred_name='Kidney Diseases', cui='C0022658', semtypes='[dsyn]', trigger='["RENAL DISEASE, NOS"-tx-1-"renal disease"-noun-1]', location='TX', pos_info='89/13', tree_codes='C12.777.419;C13.351.968.419') ConceptMMI(index='00000000', mm='MMI', score='14.64', preferred_name='Myocardial Infarction', cui='C0027051', semtypes='[dsyn]', trigger='["Infarction, Myocardial"-tx-1-"myocardial infarction"-noun-1]', location='TX', pos_info='55/21', tree_codes='C14.280.647.500;C14.907.585.500') ConceptMMI(index='00000000', mm='MMI', score='13.14', preferred_name='Physicians', cui='C0031831', semtypes='[prog]', trigger='["Physicians"-tx-1-"physicians"-noun-0]', location='TX', pos_info='6/10', tree_codes='M01.526.485.810;N02.360.810') ConceptMMI(index='00000000', mm='MMI', score='9.88', preferred_name='contraindications aspect', cui='C0079164', semtypes='[qlco]', trigger='["contraindications"-tx-1-"contraindications"-noun-0]', location='TX', pos_info='107/17', tree_codes='x.x.x') ConceptMMI(index='00000000', mm='MMI', score='9.81', preferred_name='Males', cui='C0086582', semtypes='[orga]', trigger='["MALE"-tx-1-"Male"-noun-0]', location='TX', pos_info='1/4', tree_codes='x.x.x') ConceptMMI(index='00000000', mm='MMI', score='6.79', preferred_name='Data Table', cui='C1706074', semtypes='[inpr]', trigger='["Tables"-tx-1-"tables"-noun-0]', location='TX', pos_info='195/6', tree_codes='V02.930') ConceptMMI(index='00000000', mm='MMI', score='5.18', preferred_name='Age', cui='C0001779', semtypes='[orga]', trigger='["AGE"-tx-1-"ages"-verb-0]', location='TX', pos_info='18/4', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='5.18', preferred_name='Beta carotene measurement', cui='C0696105', semtypes='[lbpr]', trigger='["Beta Carotene"-tx-1-"beta-carotene"-noun-0]', location='TX', pos_info='139/13', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='5.18', preferred_name='Cancer Genus', cui='C0998265', semtypes='[euka]', trigger='["Cancer"-tx-1-"cancer"-noun-0]', location='TX', pos_info='78/6', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='5.18', preferred_name='Electrocardiogram: myocardial infarction (finding)', cui='C0428953', semtypes='[fndg]', trigger='["MYOCARDIAL INFARCTION"-tx-1-"myocardial infarction"-noun-1]', location='TX', pos_info='55/21', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='5.18', preferred_name='Greater Than', cui='C0439093', semtypes='[qnco]', trigger='["Greater Than"-tx-1-"greater than"-adj-0]', location='TX', pos_info='202/12', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='5.18', preferred_name='Malignant Neoplasms', cui='C0006826', semtypes='[neop]', trigger='["CANCER"-tx-1-"cancer"-noun-1]', location='TX', pos_info='78/6', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='5.18', preferred_name='Myocardial Infarction ECG Assessment', cui='C3810814', semtypes='[diap]', trigger='["Myocardial Infarction"-tx-1-"myocardial infarction"-noun-0]', location='TX', pos_info='55/21', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='5.18', preferred_name='Myocardial infarction:Finding:Point in time:^Patient:Ordinal', cui='C2926063', semtypes='[clna]', trigger='["Myocardial infarction"-tx-1-"myocardial infarction"-noun-0]', location='TX', pos_info='55/21', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='5.18', preferred_name='Once a week', cui='C0558293', semtypes='[tmco]', trigger='["Once per week"-tx-1-"once per week"-noun-0]', location='TX', pos_info='215/13', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='5.18', preferred_name='Primary malignant neoplasm', cui='C1306459', semtypes='[neop]', trigger='["Cancer"-tx-1-"cancer"-noun-1]', location='TX', pos_info='78/6', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='3.77', preferred_name='No history of', cui='C0332122', semtypes='[qlco]', trigger='["No history of"-tx-1-"No history of"-noun-0]', location='TX', pos_info='33/13', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='3.63', preferred_name='Table - furniture', cui='C0039224', semtypes='[mnob]', trigger='["tables"-tx-1-"tables"-noun-0]', location='TX', pos_info='195/6', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='3.59', preferred_name='/40', cui='C0439509', semtypes='[tmco]', trigger='["/40"-tx-1-"40"-integer-0]', location='TX', pos_info='23/2', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='3.59', preferred_name='40%', cui='C3842587', semtypes='[qnco]', trigger='["40%"-tx-1-"40"-integer-0]', location='TX', pos_info='23/2', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='3.56', preferred_name='Usage', cui='C0457083', semtypes='[ftcn]', trigger='["Usage"-tx-1-"usage"-noun-0]', location='TX', pos_info='165/5', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='3.53', preferred_name='Vitamin A Drug Class', cui='C3714656', semtypes='[phsu,vita]', trigger='["VITAMIN A"-tx-1-"Vitamin A"-noun-0]', location='TX', pos_info='185/9', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='3.53', preferred_name='Vitamin A [EPC]', cui='C2825076', semtypes='[vita]', trigger='["Vitamin A"-tx-1-"Vitamin A"-noun-0]', location='TX', pos_info='185/9', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='3.50', preferred_name='Male Gender, Self Report', cui='C1706180', semtypes='[qlco]', trigger='["Male"-tx-1-"Male"-noun-0]', location='TX', pos_info='1/4', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='3.50', preferred_name='Male Phenotype', cui='C1706428', semtypes='[qlco]', trigger='["Male"-tx-1-"Male"-noun-0]', location='TX', pos_info='1/4', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='3.50', preferred_name='Male, Self-Reported', cui='C1706429', semtypes='[orga]', trigger='["Male"-tx-1-"Male"-noun-0]', location='TX', pos_info='1/4', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='3.43', preferred_name='Current (present time)', cui='C0521116', semtypes='[tmco]', trigger='["CURRENT"-tx-1-"current"-adj-0]', location='TX', pos_info='157/7', tree_codes='') ConceptMMI(index='00000000', mm='MMI', score='3.43', preferred_name='Electrical Current', cui='C1705970', semtypes='[npop]', trigger='["Current"-tx-1-"current"-adj-0]', location='TX', pos_info='157/7', tree_codes='') `

tianranzhang commented 7 years ago

Sorry for the lack of info provided. I am using Mac 10.9.5 and metamap16. I opened up a new file and ran these codes after the necessary packages imported:

mm = MetaMap.get_instance('/Users/zhangtianran/Downloads/public_mm2/bin/metamap16') result_set={'criteria': 'Male physicians, ages 40 to 84. No history of stroke, myocardial infarction, cancer, or renal disease. No contraindications to aspirin or beta-carotene. No current usage of aspirin or Vitamin A tables greater than once per week.'} concepts,error = mm.extract_concepts([result_set['criteria']])

And it's still returning the same error. I am starting to doubt that maybe I am not setting up metamap server in the right way... Do I need to make sure that the metamap server is running before I use this package?

Thanks.

AnthonyMRios commented 7 years ago

I have only ran this using linux. Let me try setting up MetaMap on my Mac and see if I can reproduce the error.

tianranzhang commented 7 years ago

Thank you so much!!

AnthonyMRios commented 7 years ago

Tianranzhang, Are you trying to use this code on python 3.x? I think that may be the problem, if you are not using python 2.7. I have only tested using python 2.7.

-edit- I just tested on my Mac and everything seems to work. Make sure you have ran ./bin/skrmedpostctl start and ./bin/wsdserverctl start

I think it may be an issue with using python 3.x instead of 2.7. If you're using a Python 3.x version.

AnthonyMRios commented 7 years ago

Were you able to get this running?

tianranzhang commented 7 years ago

Yes, it worked under python 2.7, I am still trying to get it fixed for python 3.5...

AnthonyMRios commented 7 years ago

Sure. I hope it works for you under 2.7 for your current needs. I will need to update the code to work with python 3.5 when I have a chance.

AnthonyMRios commented 7 years ago

6

tianranzhang commented 7 years ago

Thank you! I will look into potential solution as well.

Thanks for developing this wrapper!

AnthonyMRios commented 7 years ago

This should now work under 3.5