Open hemanth opened 1 month ago
Hi there, could you try this with a very small text example that only consists of a few entries, e.g., repeated versions of the entry you showed:
[
{
"input": "ये त्रि॑ष॒प्ताः प॑रि॒यन्ति॒ विश्वा॑ रू॒पाणि॒ बिभ्र॑तः । वा॒चस्पति॒र्बला॒ तेषां॑ त॒न्वो॑ अ॒द्य द॑धातु मे ॥ (१)",
"output": "The three qualities of Rajogun, Tamogun and Satogun and earth, water, tej, air, sky, tanmatra and ego, the seven substances travel everywhere in divine form, brahma, the swami of speech, give me the divine power of those elements and substances. (1)",
"instruction": "Convert Sanskrit Text to English"
},
{
"input": "ये त्रि॑ष॒प्ताः प॑रि॒यन्ति॒ विश्वा॑ रू॒पाणि॒ बिभ्र॑तः । वा॒चस्पति॒र्बला॒ तेषां॑ त॒न्वो॑ अ॒द्य द॑धातु मे ॥ (१)",
"output": "The three qualities of Rajogun, Tamogun and Satogun and earth, water, tej, air, sky, tanmatra and ego, the seven substances travel everywhere in divine form, brahma, the swami of speech, give me the divine power of those elements and substances. (1)",
"instruction": "Convert Sanskrit Text to English"
},
{
"input": "ये त्रि॑ष॒प्ताः प॑रि॒यन्ति॒ विश्वा॑ रू॒पाणि॒ बिभ्र॑तः । वा॒चस्पति॒र्बला॒ तेषां॑ त॒न्वो॑ अ॒द्य द॑धातु मे ॥ (१)",
"output": "The three qualities of Rajogun, Tamogun and Satogun and earth, water, tej, air, sky, tanmatra and ego, the seven substances travel everywhere in divine form, brahma, the swami of speech, give me the divine power of those elements and substances. (1)",
"instruction": "Convert Sanskrit Text to English"
},
{
"input": "ये त्रि॑ष॒प्ताः प॑रि॒यन्ति॒ विश्वा॑ रू॒पाणि॒ बिभ्र॑तः । वा॒चस्पति॒र्बला॒ तेषां॑ त॒न्वो॑ अ॒द्य द॑धातु मे ॥ (१)",
"output": "The three qualities of Rajogun, Tamogun and Satogun and earth, water, tej, air, sky, tanmatra and ego, the seven substances travel everywhere in divine form, brahma, the swami of speech, give me the divine power of those elements and substances. (1)",
"instruction": "Convert Sanskrit Text to English"
},
]
This is just to further find out if the issue is because of non-Latin characters in the input field or maybe because some of the fields potentially have other formatting issues.
Bug description
Dataset looks like:
What operating system are you using?
Linux
LitGPT Version