davidhallac / TICC

BSD 2-Clause "Simplified" License
455 stars 161 forks source link

where is the paper? #1

Closed mikewin closed 7 years ago

mikewin commented 7 years ago

Please give the name of paper or url, Thanks!

BTW: windows (64bit)after run generate_synthetic_data.py, run TICC.py, find this in output:

RuntimeError: 
            Attempt to start a new process before the current process
            has finished its bootstrapping phase.

            This probably means that you are on Windows and you have
            forgotten to use the proper idiom in the main module:

                if __name__ == '__main__':
                    freeze_support()
davidhallac commented 7 years ago

Unfortunately, the paper is not available online at the moment (we're still in the middle of the review process, so hopefully it will go online soon...). We'll let you know as soon as it's available, though!

As for your error, I don't have a windows machine to test it on, but I think it's due to Python's multiprocessing library, which we use for parallelization, behaving weirdly on Windows. You should be able to fix it by wrapping the script as a function called runCode() and then adding this to your Python file:

if name == 'main': freeze_support() runCode()

Please let us know if there are any other issues in getting the code running.

Thanks! David

mikewin commented 7 years ago

under linux , meet this error: P.S. Find cluster is float, after convert it to Int, pass this error. but still have problem. see below please

....

beginning with the DP - smoothening ALGORITHM

completed smoothening algorithm

printing the length of points in each cluster
length of cluster # 0 --------> 0
length of cluster # 1 --------> 0
length of cluster # 2 --------> 0
length of cluster # 3 --------> 0
length of cluster # 4 --------> 0
length of cluster # 5 --------> 0
length of cluster # 6 --------> 0
length of cluster # 7 --------> 0
length of cluster # 8 --------> 541
length of cluster # 9 --------> 0
length of cluster # 10 --------> 0
Traceback (most recent call last):
  File "ticc.py", line 567, in <module>
    true_confusion_matrix = compute_confusion_matrix(num_clusters,clustered_points,sorted_training_idx)
  File "ticc.py", line 201, in compute_confusion_matrix
    true_confusion_matrix[num,cluster] += 1
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
mikewin commented 7 years ago

Under Windows, wrap code to runCode function, and add these code to TICC.py

 if __name__ == '__main__':
    multiprocessing.freeze_support()
    runCode()

P.S. I change to linux now, :)

ITERATION ### 0
stacking Cluster # 0 DONE!!!
starting the OPTIMIZATION for cluster# 0
Traceback (most recent call last):
  File "F:/work/test/ml/gp/learn/TICC/paper code/TICC.py", line 877, in <module>
    runCode()
  File "F:/work/test/ml/gp/learn/TICC/paper code/TICC.py", line 467, in runCode
    gvx.Solve(Verbose=False, MaxIters=1000, Rho = 1, EpsAbs = 1e-6, EpsRel = 1e-6)
  File "F:\work\TICC\paper code\solveCrossTime.py", line 124, in Solve
    Verbose)
  File "F:\work\TICC\paper code\solveCrossTime.py", line 414, in __SolveADMM
    pool.map(ADMM_x, node_list)
  File "E:\tools\Miniconda2\lib\multiprocessing\pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "E:\tools\Miniconda2\lib\multiprocessing\pool.py", line 567, in get
    raise self._value
TypeError: 'NoneType' object has no attribute '__getitem__'
mikewin commented 7 years ago

new problem: there is not file " Inverse Covariance cluster =3.csv " , when I use generate_synthetic_data.py , number_of_clusters = 3.

completed solving the optimization problem for the cluster
printing the cluster len
length of the cluster  0 ------> 40
length of the cluster  1 ------> 36
length of the cluster  2 ------> 87
length of the cluster  3 ------> 41
length of the cluster  4 ------> 57
length of the cluster  5 ------> 5
length of the cluster  6 ------> 68
length of the cluster  7 ------> 53
length of the cluster  8 ------> 79
length of the cluster  9 ------> 68
length of the cluster  10 ------> 7
beginning with the DP - smoothening ALGORITHM

completed smoothening algorithm

printing the length of points in each cluster
length of cluster # 0 --------> 0
length of cluster # 1 --------> 0
length of cluster # 2 --------> 0
length of cluster # 3 --------> 0
length of cluster # 4 --------> 0
length of cluster # 5 --------> 0
length of cluster # 6 --------> 0
length of cluster # 7 --------> 0
length of cluster # 8 --------> 541
length of cluster # 9 --------> 0
length of cluster # 10 --------> 0
getting the actual Inverse covariances
getting the actual Inverse covariances
getting the actual Inverse covariances
getting the actual Inverse covariances
Traceback (most recent call last):
  File "TICC.py", line 778, in <module>
    actual_clusters[cluster] = np.loadtxt("Inverse Covariance cluster =" + str(cluster)+".csv", delimiter = ",")
  File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 858, in loadtxt
    fh = iter(open(fname, 'U'))
IOError: [Errno 2] No such file or directory: 'Inverse Covariance cluster =3.csv'

If change number_of_clusters =11 in generate_synthetic_data.py, this error will dissapear. But in function computeF1_macro, TP and FP sometime will be 0.0, cause new error.

davidhallac commented 7 years ago

Yup, you are right that changing number_of_clusters will cause the error to disappear, and I get the same computerF1_macro error that you do. I think this might be a corner case where some of the clusters "disappear", meaning they have no points in them, so there's no way to compute the F1 scores...

Instead of switching numClusters in generate_synthetic data, you can get the code running by switching "number_of_clusters = 3" in TICC.py (and keeping numClusters = 3 in generateSyntheticData). I just pushed the updated code to Github, so you should be able to re-pull the newest version and run it without any problems.

Does your application require 11 clusters, or were you just trying to get the code up and running? For real-world data, it might be easier to use TICC_solver.py, since it abstracts away a lot of the TICC internals and just requires passing it some data and the parameters. (See the README for sample use instructions).

Thanks for catching this - we're still working on finding all the corner cases in our solver, but please let us know if you come across anything else that's not working as it should be!

davidhallac commented 7 years ago

Hi - just to follow up, the paper is now available online at: http://stanford.edu/~hallac/TICC.pdf.