ShobiStassen / PARC

MIT License
41 stars 11 forks source link

Example Usage 2 has some code typos #6

Closed esimonds closed 4 years ago

esimonds commented 4 years ago

The examples on the GitHub home page are very useful, but I hit a few bugs in Example 2. Specifically, (1) an errant quotation mark, (2) a missing dot, and (3) an undefined alias for the numpy package. The code below works:

import parc
import csv
import numpy as np
import pandas as pd

## load data (50 PCs of filtered gene matrix pre-processed as per Zheng et al. 2017)

X = csv.reader(open("./pca50_pbmc68k.txt", 'rt'),delimiter = ",")
X = np.array(list(X)) # (n_obs x k_dim, 68579 x 50)
X = X.astype("float")
# OR with pandas as: X = pd.read_csv("'./pca50_pbmc68k.txt", header=None).values.astype("float")

y = [] # annotations
with open('./annotations_zhang.txt', 'rt') as f: 
    for line in f: y.append(line.strip().replace('\"', ''))
# OR with pandas as: y = list(pd.read_csv('./annotations_zhang.txt', header=None)[0])   

# setting small_pop to 50 cleans up some of the smaller clusters, but can also be left at the default 10
parc1 = parc.PARC(X,true_label=y,jac_std_global=0.15, random_seed =1, small_pop = 50) # instantiate PARC
parc1.run_PARC() # run the clustering
parc_labels = parc1.labels 
ShobiStassen commented 4 years ago

Thank you Erin, I have changed the code in the examples to reflect this!

esimonds commented 4 years ago

Great, thanks!