Error in file(file, "rt") : cannot open the connection

galaxyeee commented 1 year ago

Dear infercnv developers,

Hi. I have a problem when using "runCnvAnalysis" function of SPATA2. So, I would appreciate it if you let me know if there is a solution. spata_bm1 <- runCnvAnalysis( object = spata_bm1, directory_cnv_folder = "~/project/Spatial_yonsei/data/cnv-results", # example directory cnv_prefix = "Chr")

Thank you.

Best, EUNHA

GeorgescuC commented 1 year ago

Hi @galaxyeee ,

Since you are using a different package that calls infercnv, I am not sure what is happening there. I took a quick look at the code and they seem to be calling specific methods step by step rather than use the overall infercnv::run() pipeline. As the log suggests, I would try loading the infercnv object saved and try manually plotting it with infercnv::plot_cnv() to have more information about what the error is.

Regards, Christophe.

galaxyeee commented 1 year ago

Hi @GeorgescuC ,

Thank you so much for your answer.
I'm using inferCNV through SPATA2, is there a parameter in which I can designate the cell clusters of observations as the cluster I want when I draw heatmap? Or if heatmap distributes clusters based on CNV, is there a way to see only certain cluster separately?

The above heatmap using my data has only "bm1_spata" in observations.

But like the example heatmap above, could you tell me how to divide the y-axis by clusters that automatically distributed based on cnv (ex, "0", "1", "2") or annotated?

And I have one more question.

I want to use cnv to distinguish between a tumor cell cluster. Unlike clusters that have a clear difference from other clusters, such as cluster 2 and 3, is there a standard to distinguish between ambiguous cases like cluster 1? And How can I know what cells are in 0 to 11 clusters?

Thank you.

Best, EUNHA

GeorgescuC commented 1 year ago

Hi @galaxyeee ,

In the case of the figure you shared with your data, there is only 1 group bm1_spata because that is what infercnv received as annotation group(s) for all non ref cells when the infercnv object was created. In the case of the example figure, each division of the observations represents a different patient origin for the cells in the analysis, which is information provided as input.

The default SPATA2 runCnvAnalysis() call seems to be running subclustering with qnorm, so there should be subclustering information, although I am not sure how good the splits are for this dataset. If you update to use the latest devel version of infercnv, there is a new method called plot_subclusters() that you can use to plot subclusters as if they were annotation groups. Otherwise you can manually run the code for that method:

subcluster_obj = infercnv_obj
subcluster_obj@reference_grouped_cell_indices = list()
for (grp in names(infercnv_obj@reference_grouped_cell_indices)) {
    for (grp2 in names(infercnv_obj@tumor_subclusters$subclusters[[grp]])) {
        subcluster_obj@reference_grouped_cell_indices[[grp2]] = infercnv_obj@tumor_subclusters$subclusters[[grp]][[grp2]]
    }
}

subcluster_obj@observation_grouped_cell_indices = list()
for (grp in names(infercnv_obj@observation_grouped_cell_indices)) {
    for (grp2 in names(infercnv_obj@tumor_subclusters$subclusters[[grp]])) {
        subcluster_obj@observation_grouped_cell_indices[[grp2]] = infercnv_obj@tumor_subclusters$subclusters[[grp]][[grp2]]
    }
}

subcluster_obj@tumor_subclusters = NULL

plot_cnv(subcluster_obj,
         cluster_by_groups=TRUE,
         output_filename = output_filename,
         out_dir=out_dir,
         write_expr_matrix=FALSE)

If the subclustering looks good, you can then combine this with a call to plot_per_group() so each of the subclusters is plotted on a separate figure:

plot_per_group(subcluster_obj, out_dir="~/project/Spatial_yonsei/data/cnv-results")

Since for your data, the hclust seems of rather good quality, you can also add the k_obs_groups option to plot_cnv() to specify in how many branches to cut the hclust, starting from the root. In this case 5 seems like a good number of groups.

For segmentation into specific cnv levels, in infercnv, this is a process done by the HMM within each subcluster. It however appears SPATA2 does not run the HMM or have the option to when you use runCnvAnalysis(), but does its own analysis using the residual expression from infercnv. For questions about that part, I would ask the developers of SPATA2 as it is not code I am familiar with.

Regards, Christophe.

galaxyeee commented 1 year ago

Hi @GeorgescuC ,

It seems difficult to use the code you gave me because my current data is 10X spatial data. Then, if I put k_obs_group = 5, is it right that the cluster is divided into 5 groups based on cnv? I wonder if it is possible to check separately for each group when divided like that. As far as I checked, only the gene barcode for each chromosome is confirmed, so I wonder if I can check the group that clearly shows gain or loss. Thank you so much.

Best, EUNHA

GeorgescuC commented 1 year ago

Hi @galaxyeee ,

The code and method I mentioned in my last post are not specific to spatial or non spatial data, they are only plotting related methods that use results already stored in the processed infercnv object. If you use "k_obs_groups=5", the observations will be split in 5 groups using the cutree() method on the dendrogram that is shown on the left of the figure. So starting from the root, the 4 first branch splits (based on distance to the root) will be cut. Because the dendrogram is the result of a hierarchical clustering of the residual expression in your "cells", these splits will separate the most distant expression profiles. The residual expressions is however not touched in any way by this, the only difference will be that you will have 5 groups, indicated by one of the color bars on the left side, that you can also find in the text file outputs.

To have clear regions and specific CNV states in infercnv predicted automatically, you would need to run the HMM, but SPATA2 does not appear to allow the option, maybe for good reason because of specific features of spatial data, but having not worked with spatial data myself, I cannot say for sure. One thing for example that looks peculiar in your figure is that the middle cluster looks like a "faded out" version of the bottom cluster. If that is a result of having your "cells" in the middle cluster be a mix of malignant and normal cells (or at least "normal" looking from a CNV point of view), that would make sense, but it is not something for which infercnv's HMM is designed.

You can try to use the clusters you obtain using the above mentioned option, or look at the subcluster information (infercnv_obj@tumor_subclusters$subclusters), to calculate metrics for each cluster and compare them to each other. You can for example look at the distribution of residual expression in your references to determine what range of values is normal/healthy, then apply that as a filter to your observations and call CNVs loss/gain regions below/above that range, probably at cluster level, and at different levels of intensity.

Regards, Christophe.

galaxyeee commented 1 year ago

Hi @GeorgescuC ,

Thank you always for your quick and kind reply. It is very helpful in conducting the analysis. I have one last question.

I want to analyze the difference between cnv score of tumor subcluster and cnv score of normal subcluster because of verification. I calculated average and standard deviation of chr5 cnv value of tumor and normal subcluster, because I thought there would be a clear difference between cnv score of two subclusters. But, there were no difference from average and SD. Do you happen to know any other solution that find difference of cnv value?

x = tumor / y = normal

Thank you.

Best, EUNHA

GeorgescuC commented 1 year ago

Hi @galaxyeee ,

How are you selecting the cells from each subcluster you calculate the metrics on? Of the two clusters you have highlighted with red boxes, the top one does not seem very different from the "normal" one on chromosome 5, so comparing those two would likely not show a difference. Another thing is that even with the signal seen, most of the values are still in white/at the median, as you can check by looking at the data quantiles. You could try looking at per gene average/median between the clusters.

Regards, Christophe.

broadinstitute / infercnv

Error in file(file, "rt") : cannot open the connection #513