EducationalTestingService / rstfinder

Fast Discourse Parser to find latent Rhetorical STructure (RST) in text.
MIT License
122 stars 24 forks source link

ValueError: X has 9371 features, but StandardScaler is expecting 74698 features as input. #72

Closed mufeili closed 2 years ago

mufeili commented 3 years ago

Hi,

When I evaluate a trained discourse parsing model (e.g. using rst_eval rst_discourse_tb_edus_TRAINING_DEV.json -p rst_parsing_model.C1.0 --use_gold_syntax), I encountered the error in the title.

Since the code uses sparse features, my guess is that the set of features in the training and test sets are different.

desilinguist commented 3 years ago

@mufeili did you train the model with rstfinder using the instructions?

mufeili commented 3 years ago

@desilinguist Thank you for your reply. Yes, I followed the instructions. It seems that the issue does not exist with skll 2.1. So I guess there are some compatibility issues with skll 2.5.

desilinguist commented 3 years ago

Interesting! Yes, it could certainly be that it's a SKLL 2.5 issue since we haven't really tested rstfinder with that yet.

Glad you have a workaround for now. I will try to replicate the issue on my end and see what changes are required.

mufeili commented 3 years ago

Thank you for developing and maintaining such a great tool!

desilinguist commented 3 years ago

Thank you for the kind words! I am glad you find it useful! :)

ashleylew commented 3 years ago

I'm having this same problem -- just wondering if there are any updates on this issue?

desilinguist commented 3 years ago

Hi @ashleylew, unfortunately, we have still not gotten around to getting rstfinder to work with SKLL 2.5. Do you still see this issue if you use SKLL 2.1?