AllonKleinLab / SPRING_dev

65 stars 33 forks source link

how to pre-process annotations (custom color tracks and groupings) #17

Closed cvillamar closed 5 years ago

cvillamar commented 5 years ago

Dear colleagues, I'm trying to pre-process the required input for a local SPRING server by following the notebook in data_prep/spring_example_pbmc4k.ipynb However, I am unable to run the function that generates the required files in a way that it would include my custom annotations with continuous and categorical data.

Here is how I load the annotation files (that I previuously saved in the same format that I would normally use for the SPRING server that is hosted in Allon Klein Lab)

import csv
with open(main_spring_dir + '../../spring.groupings.csv') as csvfile:
    reader = csv.reader(csvfile)
    cell_groupings = {}
    for row in reader:
        key = row[0]
        cell_groupings[key] = row[1:]
with open(main_spring_dir + '../../spring.custom.color.tracks.csv') as csvfile:
    reader = csv.reader(csvfile)
    custom_colors = {}
    for row in reader:
        key = row[0]
        custom_colors[key] = row[1:]

But when later I call the function below to generate the subplots and processed files, it breaks:

out = make_spring_subplot(E, gene_list, save_path, 
                    normalize = False, tot_counts_final = total_counts,
                    min_counts = 3, min_cells = 3, min_vscore_pctl = 60,show_vscore_plot = True, 
                    num_pc = 60, 
                    k_neigh = 5, 
                    num_force_iter = 500,
                    cell_groupings = cell_groupings,
                    custom_colors = custom_colors)

Displaying the following:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-e5e49c56d331> in <module>()
     11                     num_force_iter = 500,
     12                     cell_groupings = cell_groupings,
---> 13                     custom_colors = custom_colors)
     14 
     15 np.save(save_path + '/cell_filter.npy', np.arange(E.shape[0]))

/restricted/projectnb/crem-bioinfo/project_code/00_pan_project/SPRING_dev/data_prep/spring_helper.pyc in make_spring_subplot(E, gene_list, save_path, base_ix, normalize, exclude_dominant_frac, min_counts, min_cells, min_vscore_pctl, show_vscore_plot, exclude_gene_names, num_pc, sparse_pca, pca_norm, k_neigh, cell_groupings, num_force_iter, output_spring, precomputed_pca, gene_filter, custom_colors, exclude_corr_genes_list, exclude_corr_genes_minCorr, dist_metric, use_approxnn, run_doub_detector, dd_k, dd_frac, dd_approx, tot_counts_final)
    778             save_spring_dir_sparse_hdf5(E, gene_list, save_path, list(links),
    779                             custom_colors = custom_colors,
--> 780                             cell_groupings = cell_groupings)
    781         else:
    782             save_spring_dir_sparse_hdf5(E, gene_list, save_path, list(links),

/restricted/projectnb/crem-bioinfo/project_code/00_pan_project/SPRING_dev/data_prep/spring_helper.pyc in save_spring_dir_sparse_hdf5(E, gene_list, project_directory, edges, custom_colors, cell_groupings)
    654     # save custom colors
    655     custom_colors['Uniform'] = np.zeros(E.shape[0])
--> 656     write_color_tracks(custom_colors, project_directory+'color_data_gene_sets.csv')
    657 
    658     # create and save a dictionary of color profiles to be used by the visualizer

/restricted/projectnb/crem-bioinfo/project_code/00_pan_project/SPRING_dev/data_prep/spring_helper.pyc in write_color_tracks(ctracks, fname)
    598     out = []
    599     for name,score in ctracks.items():
--> 600         line = name + ',' + ','.join(['%.3f' %x for x in score])
    601         out += [line]
    602     out = sorted(out,key=lambda x: x.split(',')[0])

TypeError: float argument required, not str

Here's a view of the input of those annotations:

head -n 5 spring.custom.color.tracks.csv| cut -f 1-4 -d ","
nCount_RNA,15703,18128,41380
nFeature_RNA,4231,3411,6802
percent.mt,5.48302872062663,4.08208296557811,3.33977767037216
nCount_SCT,8331,7728,7891
nFeature_SCT,3526,2333,3155

head -n 5 spring.groupings.csv| cut -f 1-4 -d ","
orig.ident,F00431,F01380,F01391
Diagnosis,IPF,IPF,IPF
Sample_Name,TILD001,TILD028,VUILD64
Sample_Source,NTI,NTI,Vanderbilt
Status,ILD,ILD,ILD

Am I reading the annotations in the wrong format? I was wondering if you had any version of the example notebook that would include annotations (continuous and categorical). Thanks a lot for your help!

calebweinreb commented 5 years ago

Hi,

The problem is that the custom_colors dictionary should have lists of floats for the values, but you have strings. It should work if you modify your code as follows:

import csv with open(main_spring_dir + '../../spring.groupings.csv') as csvfile: reader = csv.reader(csvfile) cell_groupings = {} for row in reader: key = row[0] cell_groupings[key] = row[1:] with open(main_spring_dir + '../../spring.custom.color.tracks.csv') as csvfile: reader = csv.reader(csvfile) custom_colors = {} for row in reader: key = row[0] custom_colors[key] = [float(v) for v in row[1:]]

On Tue, Sep 17, 2019 at 5:46 PM Carlos Villacorta notifications@github.com wrote:

Dear colleagues, I'm trying to pre-process the required input for a local SPRING server by following the notebook in data_prep/spring_example_pbmc4k.ipynb However, I am unable to run the function that generate the required files in a way that it would include my custom annotations with continuous and categorical data.

Here is how I load the annotation files (that I previuously saved in the same format that I would normally use for the SPRING server that is hosted in Allon Klein Lab)

import csv with open(main_spring_dir + '../../spring.groupings.csv') as csvfile: reader = csv.reader(csvfile) cell_groupings = {} for row in reader: key = row[0] cell_groupings[key] = row[1:] with open(main_spring_dir + '../../spring.custom.color.tracks.csv') as csvfile: reader = csv.reader(csvfile) custom_colors = {} for row in reader: key = row[0] custom_colors[key] = row[1:]

But when later I call the function below to generate the subplots and processed files, it breaks:

out = make_spring_subplot(E, gene_list, save_path, normalize = False, tot_counts_final = total_counts, min_counts = 3, min_cells = 3, min_vscore_pctl = 60,show_vscore_plot = True, num_pc = 60, k_neigh = 5, num_force_iter = 500, cell_groupings = cell_groupings, custom_colors = custom_colors)

Displaying the following:


TypeError Traceback (most recent call last)

in () 11 num_force_iter = 500, 12 cell_groupings = cell_groupings, ---> 13 custom_colors = custom_colors) 14 15 np.save(save_path + '/cell_filter.npy', np.arange(E.shape[0])) /restricted/projectnb/crem-bioinfo/project_code/00_pan_project/SPRING_dev/data_prep/spring_helper.pyc in make_spring_subplot(E, gene_list, save_path, base_ix, normalize, exclude_dominant_frac, min_counts, min_cells, min_vscore_pctl, show_vscore_plot, exclude_gene_names, num_pc, sparse_pca, pca_norm, k_neigh, cell_groupings, num_force_iter, output_spring, precomputed_pca, gene_filter, custom_colors, exclude_corr_genes_list, exclude_corr_genes_minCorr, dist_metric, use_approxnn, run_doub_detector, dd_k, dd_frac, dd_approx, tot_counts_final) 778 save_spring_dir_sparse_hdf5(E, gene_list, save_path, list(links), 779 custom_colors = custom_colors, --> 780 cell_groupings = cell_groupings) 781 else: 782 save_spring_dir_sparse_hdf5(E, gene_list, save_path, list(links), /restricted/projectnb/crem-bioinfo/project_code/00_pan_project/SPRING_dev/data_prep/spring_helper.pyc in save_spring_dir_sparse_hdf5(E, gene_list, project_directory, edges, custom_colors, cell_groupings) 654 # save custom colors 655 custom_colors['Uniform'] = np.zeros(E.shape[0]) --> 656 write_color_tracks(custom_colors, project_directory+'color_data_gene_sets.csv') 657 658 # create and save a dictionary of color profiles to be used by the visualizer /restricted/projectnb/crem-bioinfo/project_code/00_pan_project/SPRING_dev/data_prep/spring_helper.pyc in write_color_tracks(ctracks, fname) 598 out = [] 599 for name,score in ctracks.items(): --> 600 line = name + ',' + ','.join(['%.3f' %x for x in score]) 601 out += [line] 602 out = sorted(out,key=lambda x: x.split(',')[0]) TypeError: float argument required, not str Here's a view of the input of those annotations: head -n 5 spring.custom.color.tracks.csv| cut -f 1-4 -d "," nCount_RNA,15703,18128,41380 nFeature_RNA,4231,3411,6802percent.mt,5.48302872062663,4.08208296557811,3.33977767037216 nCount_SCT,8331,7728,7891 nFeature_SCT,3526,2333,3155 orig.ident,F00431,F01380,F01391 Diagnosis,IPF,IPF,IPF Sample_Name,TILD001,TILD028,VUILD64 Sample_Source,NTI,NTI,Vanderbilt Status,ILD,ILD,ILD Am I reading the annotations in the wrong format? I was wondering if you had any version of the example notebook that would include annotations (continuous and categorical). Thanks a lot for your help! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub , or mute the thread .
cvillamar commented 5 years ago

Thanks a lot, Caleb. That solved the issue.