Closed FedericaBrando closed 5 months ago
This is probably due to some gene having an empty gene name. This can be solved by using another identifier for those specific rows.
Upon inspection of the code in DriverSummary, the above mentioned problem are due to this lines, where get the name of the cohort to annotate the vet dataframe is taken from the drivers one.
Although a major problem is that the zip
does not imply that the drivers dataframe and the vet dataframe are from the same cohort. There fore this leads to the appending of a nan to those cohort that do not have a driver dataframe, but do have a vet dataframe or leading to mistakenly annotate a certain cohort to a different cohort vet dataframe.
By relying on the file name, it would solve the issue.
In the
unfiltered_drivers.tsv
, we end up with some genes with nans in the following columns:this is probably due to a merge between
cohort.tsv
and the{COHORT}.drivers.tsv
.Further investigation is needed.