bence-szalai / Data-preparation-for-CTD2

GNU General Public License v3.0
1 stars 5 forks source link

s is not defined #2

Closed allaway closed 4 years ago

allaway commented 4 years ago

For this chunk:

#for compunds where inhibitor,blocker,antagonist etc. are in the MoA columns
#we assume they are inhibitory compounds, so we will mark them with -1 in the meta_matrix
# for other compounds, we assume they are activators, we will mark them with +1
#this is probably not a perfect way to access inhibitory/acovatory state, but good for a first try
inhibitory_words=set(['inhibitor','blocker','antagonist','inihibitor']) #inihibitor is just a typo
for i in drug_metadata.index:
    if list(drug_metadata.index).index(i) % 100==0:
        print('Done for %i drugs' %list(drug_metadata.index).index(i))
    brd=drug_metadata.loc[i,'broad_id']
    if not pd.isnull(drug_metadata.loc[i,'moa']):
        moas=drug_metadata.loc[i,'moa'].split('|')
    else:
        moas=[]
    if not pd.isnull(drug_metadata.loc[i,'target']):
        s=1
        targets=drug_metadata.loc[i,'target'].split('|')
        if len(set((' '.join(moas)).split())&inhibitory_words)>0:
            s=-1
    else:
        targets=[]
    meta_matrix.loc[brd,moas]=1
    meta_matrix.loc[brd,targets]=s

I get the error

Done for 0 drugs
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-29-ff5ed4cb33f5> in <module>()
     20         targets=[]
     21     meta_matrix.loc[brd,moas]=1
---> 22     meta_matrix.loc[brd,targets]=s

NameError: name 's' is not defined

Apologies for all of the issues. I am an R person with only passing python familiarity...

Defining s in the final else statement seems to fix this but it's not clear to me whether this is an appropriate fix:

for i in drug_metadata.index:
    if list(drug_metadata.index).index(i) % 100==0:
        print('Done for %i drugs' %list(drug_metadata.index).index(i))
    brd=drug_metadata.loc[i,'broad_id']
    if not pd.isnull(drug_metadata.loc[i,'moa']):
        moas=drug_metadata.loc[i,'moa'].split('|')
    else:
        moas=[]
    if not pd.isnull(drug_metadata.loc[i,'target']):
        s=1
        targets=drug_metadata.loc[i,'target'].split('|')
        if len(set((' '.join(moas)).split())&inhibitory_words)>0:
            s=-1
    else:
        s=0
        targets=[]
    meta_matrix.loc[brd,moas]=1
    meta_matrix.loc[brd,targets]=s

Let me know what you think!

bence-szalai commented 4 years ago

yes, it was missing - strange for not rising an error for me, probably during testing the code I already defined s. Anyway thank you very much for testing it, putting s=0 there solves the problem, :)

allaway commented 4 years ago

Thanks! Writing out the csv in the next chunk also gives me a weird encoding error...are you familiar with this? :

---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-33-3bfaa550dd86> in <module>()
----> 1 meta_matrix.to_csv('../results/drugs_meta.csv',sep=',')

/home/ec2-user/anaconda3/envs/py2/lib/python2.7/site-packages/pandas/core/generic.pyc in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, tupleize_cols, date_format, doublequote, escapechar, decimal)
   3018                                  doublequote=doublequote,
   3019                                  escapechar=escapechar, decimal=decimal)
-> 3020         formatter.save()
   3021 
   3022         if path_or_buf is None:

/home/ec2-user/anaconda3/envs/py2/lib/python2.7/site-packages/pandas/io/formats/csvs.pyc in save(self)
    170                 self.writer = UnicodeWriter(f, **writer_kwargs)
    171 
--> 172             self._save()
    173 
    174         finally:

/home/ec2-user/anaconda3/envs/py2/lib/python2.7/site-packages/pandas/io/formats/csvs.pyc in _save(self)
    272     def _save(self):
    273 
--> 274         self._save_header()
    275 
    276         nrows = len(self.data_index)

/home/ec2-user/anaconda3/envs/py2/lib/python2.7/site-packages/pandas/io/formats/csvs.pyc in _save_header(self)
    240         if not has_mi_columns or has_aliases:
    241             encoded_labels += list(write_cols)
--> 242             writer.writerow(encoded_labels)
    243         else:
    244             # write out the mi

UnicodeEncodeError: 'ascii' codec can't encode character u'\xd0' in position 9: ordinal not in range(128)
allaway commented 4 years ago

utf-8 encoding seems to fix this!

bence-szalai commented 4 years ago

great! just came back to check it, but seems like its fine. thanks!