Open MonolithFoundation opened 3 months ago
@MonolithFoundation Hi! I get the same problems. In particular, on MMbench, MME, SEEDBench, ScienceQA, AI2D datasets.
Hello, we have selectively extracted several subsets from it. Currently, it can provide a certain degree of benefit. However, the entire distribution is entirely attributed to Cambrian claims. Nevertheless, large amounts of data can enhance performance to a certain extent, but the training time is excessively long.
Follow the same model setup and training steps, but different only on data, the Cambrian7M with system rpompt data got bad result (I mean very bad, the model almost failed to talk, reasonbility also bad, all metric are collapsed)
Any reason for this?