Murali-group / Beeline

BEELINE: evaluation of algorithms for gene regulatory network inference
GNU General Public License v3.0
171 stars 51 forks source link

Question about BEELINE PseudoTime.csv file format in BEELINE-data? #90

Open gangcai opened 1 year ago

gangcai commented 1 year ago

Here is an example header lines for the PseudoTime.csv file in the Curated GSD inputs:

,PseudoTime1,PseudoTime2
E801_438,,0.54762
E1_769,,0.96241
E1795_733,0.9172899999999999,
E645_767,,0.9599
E487_58,,0.07143
E284_5,,0.0050100000000000006
E1797_596,0.74561,
E1058_571,,0.71429
E426_671,0.8396,

My questions are: (1) Why there are two columns of Pseudotime? (2) Why there are missing values for some rows?

ktakers commented 1 year ago

1) In the case that there is more than one inferred trajectory, there can be more than one Pseudotime column corresponding with each trajectory. 2) Each cell may be included in one or more trajectories. If a cell is assigned as pseudotime value in one trajectory after a bifurcation, it might not be included in the second trajectory at all. In that case the cell would have an empty value in the Pseudotime column corresponding with the second trajectory.

gangcai commented 1 year ago

Thanks for the feedback. I also found that some files replaced the empty values with NA, it might be better to have consistent format.

eg: Synthetic/dyn-BFC/dyn-BFC-200-1

,PseudoTime1,PseudoTime2
E53_457,NA,0.56907
E13_736,NA,0.92269
E141_182,0.22053,NA
E35_577,NA,0.72117
E122_475,0.59189,NA