Mangio621 / Mangio-RVC-Fork

*CREPE+HYBRID TRAINING* A very experimental fork of the Retrieval-based-Voice-Conversion-WebUI repo that incorporates a variety of other f0 methods, along with a hybrid f0 nanmedian method.
MIT License
1.01k stars 217 forks source link

cli infer: missing/wrong/mystery 'arg 14' ? #69

Open protocolGITHUB opened 1 year ago

protocolGITHUB commented 1 year ago

There's a mistake in the instructions/template when inferring from cli:

based on onscreen listed args, this should work, but doesn't: model.pth input.wav out.wav logs/index.index 0 -3 hybrid[harvest+crepe] 32 7 0 1 .78 .16 False

error: "if com[14] == "False" or com[14] == "false": IndexError: list index out of range"

If I add an additional 'mystery' arg before the 'False', like in the sample line printed onscreen below the args list, it works (added '.45'): model.pth input.wav out.wav logs/index.index 0 -3 hybrid[harvest+crepe] 32 7 0 1 .78 .16 .45 False

Does that arg get ignored ('.45'), or how is it used? Hoping to get good hybrid results without arg 14 messing with the result.

EDIT: looks like its just a typo, gets ignored I think:

protection_amnt = float(com[12]) protect1 = 0.5 if com[14] == "False" or com[14] == "false":

kalomaze commented 1 year ago

Quefrency and Timbre options come after 'True' if you set the arg to 'True'. It is for the experimental formanting option. E.g, if you don't want to formant:

model.pth input.wav out.wav logs/index.index 0 -3 hybrid[harvest+crepe] 32 7 0 1 .78 .16 False

If you do want to formant:

model.pth input.wav out.wav logs/index.index 0 -3 hybrid[harvest+crepe] 32 7 0 1 .78 .16 True 8.0 1.2 (8.0 Quefrency and 1.2 Timbre is the 'male to female' conversion preset in GUI)'

False will not require options for the formant shift as it will be turned off. We will update the appropriate readme explanation; CLI explanation was updated within the actual file, but not on the README

protocolGITHUB commented 1 year ago

Thanks! I understand how to use the feature, I was pointing out the bug in the code and description.

infer_web.py, line #1568: if com[14] == "False" or com[14] == "false":

change to: if com[13] == "False" or com[13] == "false":

Until the above line is corrected, you will still need to insert an extra dummy arg14/com[13] for the code to work as is.

model.pth input.wav out.wav logs/index.index 0 -3 hybrid[harvest+crepe] 32 7 0 1 .78 .16 foo False

thanks!

protocolGITHUB commented 1 year ago

Additional info:

corrected indexes (not sure if it breaks elsewhere), infer_web.py line #1568+

if com[13] == "False" or com[13] == "false":
        DoFormant = False
        Quefrency = 0.0
        Timbre = 0.0
        CSVutil(
            "csvdb/formanting.csv", "w+", "formanting", DoFormant, Quefrency, Timbre
        )

else:
        DoFormant = True
        Quefrency = float(com[14])
        Timbre = float(com[15])
        CSVutil(
            "csvdb/formanting.csv", "w+", "formanting", DoFormant, Quefrency, Timbre
        )
TripleKiller666 commented 1 year ago

So I noticed, the Args don´t match up with the example, unless I miss something.

Example below: Arg1) mi-test.pth Arg2) saudio/Sidney.wav Arg3) myTest.wav Arg4) logs/mi-test/added_index.index Arg5) 0 Arg6) -2 Arg7) harvest Arg8) 160 Arg9) 3 Arg10) 0 Arg11) 1 Arg12) 0.95 Arg13) 0.33 Arg14) 0.45 Arg15) True Arg16) 8.0 Arg17) 1.2

In the Example above we would set Arg15 to be True or False, not Arg14. I don´t even know what Arg14 is supposed to be (maybe max Breath Protection???) Most User would likely skip the Arg14 Number and instead type True or False, because the Description is: arg 14) Whether to formant shift the inference audio before conversion: False (if set to false, you can ignore setting the quefrency and timbre values for formanting)