Closed ErinWeisbart closed 1 year ago
@MerajRamezani I'm stuck on cleaning up Figure 3C. Starting with DMEM, I get the following error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[21], line 48
46 hit_corr_dic = {}
47 for s in hit_pair_set:
---> 48 hit_corr_dic[s] = corr_dic[s]
50 print(f'For condition {condition} \n Number of hit pairs is {len(hit_pair_set)} \n',
51 f'Number of hit pairs with correlation is {len(hit_corr_dic)}')
53 parent_corr_dic[condition] = corr_dic
KeyError: frozenset({'MALT1'})
Can you take a look and figure out what's going on? Or schedule a meeting so we can talk through it together?
@MerajRamezani I'm stuck on cleaning up Figure 3C. Starting with DMEM, I get the following error:
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[21], line 48 46 hit_corr_dic = {} 47 for s in hit_pair_set: ---> 48 hit_corr_dic[s] = corr_dic[s] 50 print(f'For condition {condition} \n Number of hit pairs is {len(hit_pair_set)} \n', 51 f'Number of hit pairs with correlation is {len(hit_corr_dic)}') 53 parent_corr_dic[condition] = corr_dic KeyError: frozenset({'MALT1'})
Can you take a look and figure out what's going on? Or schedule a meeting so we can talk through it together?
@ErinWeisbart this what I mentioned in the meeting last week. I already have added some code to resolve this issue:
# Create a list of protein clusters with all complexes that had at least 66% of genes represented within the Hela DMEM WGS hits
cluster_count = 0
hit_cluster_list_list = []
hit_set = set()
for i in range(len(ppi_data_h)):
cluster = ppi_data_h.iloc[i]['subunits(Gene name)'].split(';')
count = 0
hit_cluster_list = []
for g in cluster:
if g in genes:
count += 1
hit_set.add(g)
hit_cluster_list.append(g)
if (count/len(cluster)) >= 0.66:
cluster_count += 1
if hit_cluster_list and (count/len(cluster)) >= 0.66:
hit_cluster_list_list.append(hit_cluster_list)
print(len(hit_set),cluster_count,len(hit_cluster_list_list))
# Assign correlations to hit gene pairs
hit_pair_set = set()
for l in hit_cluster_list_list:
for c in list(permutations(l,2)):
hit_pair_set.add(frozenset(c))
hit_corr_dic = {}
for s in hit_pair_set:
hit_corr_dic[s] = corr_dic[s]
print(' Number of hit pairs',len(hit_pair_set),'\n',
'Number of hit pairs with correlation',len(hit_corr_dic))
Considering that we decided to use PCA in these analysis maybe it make sense for me to update the CORUM & STRING analysis before you rerun all sections? I am also available to meet based on my calendar openings.
Though this isn't quite finished, I'm going to merge it into main.
I can bypass the error I discussed above by ensuring that the hit_pair_set
has a length of 2.
I'm now tracking cleanup needed in #12
Changes to note: Using the updated CCLE data has a small effect on our hit calling as the control groups are slightly changed. This has a ripple effect causing minor changes to many numbers/figures in the notebook. Visible changes include: Fig 2A/B: The number of compartment-specific and whole cell hits has changed from