Closed mstreit closed 11 years ago
@ngehlenborg RFC
Firehose uses 80% (at least in the code that I have seen) plus imputation of missing values (#1555). Should we stick to this for now to make sure that we get similar gene lists when sampling?
The look of the matrices with missing values, however, is concerning me a bit. I think imputation of missing values should have a higher priority than I originally thought.
Related to #1534
@ngehlenborg OK, the version in the repository uses 80%.
Also waiting for #1585
We decided to not wait for #1555. That means we are ready to generate the TCGA cal files.
what clusterer should we use: kmeans, tree or affinity?
tree
what we also have to discuss is, where to put them on the server.
currently the tcga data browser is looking within the 3.0 directory like the pathway/mapping cache loader
the projects claim to be stored in the 3.0.2 directory.
We somehow need an indicator in the code that tells us when the data packages are incompatible to an old version. I think we will not automatically derive this information from the version number. @sgratzl Do you have a suggestions how to address this issue?
We somehow need an indicator in the code that tells us when the data packages are incompatible to an old version. I think we will not automatically derive this information from the version number.
the data packages meta info file contain the version with which caleydo version they are produced. That is not the problem.
It is again about where to put the files on the server. As I don't know whether we are upward-compatible (caleydo 3.0.0 opening a 3.0.2. project)
Being downward compatible is important. I would ignore upward-compatibility for now. People should use the latest version, however, they should be able to load there old projects if possible.
Let's stick to our policy that all 3.0.* packages are compatible with the current build (3.0.2). So the TCGA packages as well as the sample projects will be stored under "3.0" on the server. However, we should rename "3.0/tcga" to "3.0/tcga_sampled" before moving all new TCGA packages with the full matrices to "3.0/tcga".
Yes, agree with all points!
Sent from a mobile device.
On Aug 30, 2013, at 7:26 AM, Marc Streit notifications@github.com wrote:
Let's stick to our policy that all 3.0.* packages are compatible with the current build (3.0.2). So the TCGA packages as well as the sample projects will be stored under "3.0" on the server. However, we should rename "3.0/tcga" to "3.0/tcga_sampled" before moving all new TCGA packages with the full matrices to "3.0/tcga".
— Reply to this email directly or view it on GitHub.
2013-05-21, 2013-04-23, 2013-03 and 2013-02 is online, rest on demand
We should probably wait for Nils to confirm the 40% threshold in the sampling. Other than that we are ready to go.