Open chaserhkj opened 11 years ago
我是第一个来Github的吗,总之先支持一个
chaserhkj,十分感谢你的关注和支持!现在Rocaloid编辑器正好缺人。
期待暑假合作。我们的开发qq群:255842311
Chaserhkj, thanks for your attention and support! And we just need someone to write the Rocaloid Editor.
I'm looking forward to cooperating in summer holidays. Here is our QQ group for development: 255842311
【感觉在国外的网站上还是用英语比较好……而且我要多练习英语……
我觉得 Rocaloid 的未来不仅仅是提初音ミク的音源,希望能支持自制音源。 而且希望将来能不仅仅支持中文(需要在整个程序框架设计上下一些功夫啊)
@Sleepwalking 关于GUI工具包的问题,我还是倾向于使用Qt。我可以列举如下的几点要点:
1.据我所知,Qt库提供了原生的GTK样式支持,暂且不提自己的样式系统都没弄清楚的Qt5程序,Qt4程序在Linux下各桌面环境都有较好的兼容性...至于你说的GNOME上的问题好像自Qt4.3以来就解决了..
2.Qt不单单是一个GUI库,Qt是一整个应用程序框架,提供了包括底层数据结构,文件IO,网络通信,多媒体处理,HTML/XML 解析,SQL数据库支持,脚本解析框架,插件框架,当然还有图形界面在内的许多功能,在这里,我们最需要注意的是多媒体处理和插件框架。首先在Rocaloid的建构上,多媒体处理将是一个非常重要的环节,这里我说的处理并不是语音合成这一部分,这一部分应当是我们自己做的...我所指的是一些音频格式的解码(作为BGM)和音频的播放。如果使用VB,我们这些工作很多都是可以通过调用Windows系统库完成的...但转到C++时,又要考虑跨平台的问题,就不得不选择一个跨平台声音库来使用了...另一方面,考虑软件的长远发展,提供插件功能是必要的,但由于各平台下系统API的区别,这又是一个非常麻烦的跨平台问题..但是Qt提供了一个跨平台的插件框架立刻能解决这个问题。
3.Qt的文档是我所知的任何一个GUI工具库中最为详尽的...这使得Qt程序的开发比起许多其他GUI程序都要轻松的多。
所以,请慎重的考虑是否要使用Qt的问题。 当然,我所述的一切都只是出于我个人看法的建议,如果你坚持要用WxWidget...我也码得了代码..而且我也会尊敬项目领导者的意见 = w =
BTW,QQ群已加...但是实在不怎么用QQ,重要事情还是通过Github、邮件或者GTalk联系都行...邮件地址我Github信息页里面有..
祝好 Chaserhkj
More about the GUI toolkit, I would say that I still insists that we should use Qt. I could list certain reasons for that:
1.According to my knowledge, Qt library has native support for GTK styles, thus except for Qt 5 applications whose own style system is not very ready, Qt 4 application is compatible with most of the DEs in Linux environment... As for the compatibility problem you mentioned, it seems to be solved ever since the Qt 4.3 release...
2.Qt library is just more than a GUI library, it is a framework for building applications providing lots of features including low level data structures, file I/O, network access, multimedia processing, HTML/XML parsing, SQL database support, script parsing support, plugin framework support and of course, Graphical User Interface support. For our Rocaloid, we should note on multimedia and plugins , Firstly, multimedia processing would be a great component for Rocaloid. I'm not talking about generating voice, that's what we should implement ourselves. I'm talking about audio codecs and audio playbacks... Currently, as we are using vb coding, we can use Windows APIs and Windows libraries to achieve all this. But as we are planning to port this to C++, it would be a cross-platform issue to choose a audio processing library... On the other hand, considering the future development of our application, we should surely implement a plugin system for extending our Rocaloid, which would be a cross-platform problem due to API difference across different systems. Yet Qt just provides a cross-platform framework for building plugins.
3.Qt has the most verbose documentation among all the GUI tool kits I've known, this reduces many pains to develop a Qt application.
As a conclusion, please think carefully about using Qt for developing GUI. Of course, what I suggest here only represents my opinions, if you guys insists using WxWidget... I can still do codings and I would surely respect you project leaders' opinions.
BTW, I have joined the QQ group, but as I do not use QQ very often, please still send important messages via Github, Email or GTalk... You can get my email address through my github profile page.
Best wishes, Chaserhkj
弱弱地插一句,说不定你会对 Pascal 语言感兴趣。 从 VB 转 Pascal 有的时候会比较轻松。 可以试试看 Lazarus 这款 IDE,是 Delphi 的开源仿制品,不仅仅是 IDE,还自带全套库(称为 LCL--Lazarus Component Library),创建控件和 VB 一样简单。 Lazarus 的口号是 Write Once, Compile Everywhere. 可见其可移植性。 当然,从长远发展来看,我还是建议用成熟得多的 Qt 库。 我非常不建议使用 wxWidget,因为它只能完成 GUI 部分,对于音频回放、配置文件处理、数据库提取等需要单独去实现。 如果您执意使用 wxWidget 我也无法阻拦啊。 我不使用 QQ,如果有什么重要事情,请邮件给我。我的邮件地址在我的 GitHub 个人资料页。
Best regards, Star Brilliant
Sorry for my interruption, you may be interested in Pascal. It may be easy to switch from VB to Pascal. You can have a try at Lazarus IDE, which is a open-source duplicate of Delphi. Not only is it an IDE, it also includes a whole suit of library called LCL--Lazarus Component Library. It is similar to create components as easy as in VB. The slogan of Lazarus is Write Once, Compile Everywhere where you can see the portability of which. However, considering the future development of Rocaloid, I still suggest Qt which is much more stable. I think wxWidget is deprecated, for it can only do the GUI. It is needed to implement audio playback, parse of configuration file, database queries separately. Anyhow, I can not stop you from using wxWidget. I do not use QQ. Please send me mails if there is something important. My mail address is on my GitHub profile page.
Best regards, Star Brilliant
从预览音频直接输出的角度来说,直接调用PulseAudio可能更好。不过音频编解码这一点我就没有考虑到了,失策。 因为原来接触过不少使用wx的程序。听了各位的建议,我想我们还得再思考思考。
In my opinion,to output preview audio stream,maybe PulseAudio is better than other library. But I haven't think of audio encode/decode. it is my failure. I have use some program using wxWidget,and written some program using wxWidget....so I think wxWidget is easy to use . Of course wx included some awful feature. I think we need brain storm to solve this problem.
Strongly against for tight coupling with any GUI framework. If needed, just grant trusts to others to do the GUI stuff.
FrankHB wrote:
Strongly against for tight coupling with any GUI framework. If needed, just grant trusts to others to do the GUI stuff.
I suuport. The core engine must be separated from the GUI frontend. The typical model is that the GUI frontend output a intermediate file as the input of CLI engine. Everything happens automatically. So this enables others to develop a different GUI for your engine.
Good idea. I agree. So we should focus on the engine and parameter generator and let someone else open up another repository for the GUI. And I think a CLI based engine would result in flashing command line window after you pressing 'synthesis'... bad experience. So we'd better provide both CLI based engine and the dynamic link library version.
On 2013-6-14,16:09,"Sleepwalking" wrote:
And I think a CLI based engine would result in flashing command line window after you pressing 'synthesis'... bad experience.
Didn't you know that in Linux, a terminal window will never show up unless you ask to?
Didn't you know that in Windows, there is a SW_HIDE
option for CreateProcessW
and WinExec
?
On 2013-6-14,16:09,"Sleepwalking" wrote:
So we'd better provide both CLI based engine and the dynamic link library version.
However you should be aware that, if your engine crashes, external-process model will protect your editor and your work; if you use dynamic-library model will leave you nothing in case it crashes.
Let us take VOCALOID and UTAU as examples.
The core engine of VOCALOID is in a DLL. I often experience freeze if I write too complex phonetic symbols and I can even do nothing to stop the playback! (Hitting the stop button will not stop the render until the current rendering note fully rendered.)
However, if I make a note too short in renzoku mode in UTAU, the render process will crash as well. But the editor is safe! And the most exciting thing is, I can even render in batch thanks to command-line interface!
As to the console window or terminal window you just have mentioned, have you noticed that when you hit Compile, your IDE is actually executing gcc
, cc1
, as
, ld
in order in the background (or CL
, LINK
etc if you use Microsoft Visual Studio)? Most user will probably not even be aware of this!
What about a media converter software like avidemux
? It calls ffmpeg
(or loads only the library of ffmpeg
which is called libavformat.so
/ libavformat.dll
if I remembered correctly) and user will never notice that. The opposite is Ulead VideoStudio, which often freezes when I encode 1080p video, only because it renders in the same process of the editor.
@Sleepwalking Flashing command window issue was definitely due to incorrect calling of external program or due to incorrect flag passed to compiler when building...
Invoking system calls from frontend program can surely smoothly do the job
@chaserhkj wrote:
Flashing command window issue was definitely due to incorrect calling of external program or due to incorrect flag passed to compiler when building... Invoking system calls from frontend program can surely smoothly do the job
As you see, @Sleepwalking has his mind stuck to Windows.
Quote the comment I have just sent a few minutes ago:
Didn't you know that in Linux, a terminal window will never show up unless you ask to? Didn't you know that in Windows, there is a
SW_HIDE
option forCreateProcessW
andWinExec
?
Ok... You win... I admit that I stuck my mind to windows... I'm learning linux these days...
Is there any way to share memory between different processes? I'm also afraid of frequent IO on disk, like what Utau does.
@Sleepwalking wrote,
Is there any way to share memory between different processes? I'm also afraid of frequent IO on disk, like what Utau does.
If you have tried X11 (it is currently a must-have component of Linux), you will know how to do it.
X11 uses both socket/pipe and shared memory.
I don't recommend shared memory since it is implemented differently in Windows and in Linux though shared memory is much much faster than pipe and socket. In Windows, shared memory is achieved with WM_COPYDATA
messages.
I recommend using pipe.
What is pipe? When you execute dir | more
in command prompt, you are using pipe.
In Linux, we have other pagers besides more
, such as less
, w3m
, most
, or even the text editor vim -
(with an argument -
). No matter which pager you choose, the program dir
or ls
will not behave differently.
So the advantage of pipe is that, if one side changes to another compatible program, they will still work.
It is true in Rocaloid because someone else may write a new graphical front-end to replace the current one.
Another reason I discourage using shared memory is that if you write another GUI front-end to work with it, unexpected results may appear. Use shared memory if only you are copying gigabytes of data.
In Windows, you can use CreatePipe
and in Linux, you can use pipe
to create a pair of pipe.
Do you know why UTAU is doing disk I/O? It saves temporary WAV file on disk, one note per file. I believe that it is unnecessary. You can do it better, without using too much disk I/O.
I recommend the model of Kdenlive and Melt. Kdenlive is a video sequence editor. Melt is a video renderer and it can call ffmpeg to encode the video.
When you finish a project and hit Render in Kdenlive, it will export a short script which Melt can understand to a temporary folder.
Then Kdenlive calls Melt. Melt keeps printing the current percentage and Kdenlive use pipe to read it and show it to user.
Also, Melt calls ffmpeg to encode the video to a format you will probably know, such as mp4 or avi.
The advantage is that, if someday, Melt changed to use GStreamer to encode the video, Kdenlive will not behave differently. This is unachievable with dll
or so
.
Every program does its own job and does it well. It is the spirit of UNIX and Linux. You can choose not to use Linux for daily life, but you may not choose to refuse the spirit. It will help you much.
I suggest exposing the core engine API and building both CLI and GUI as front-ends based on this API layer. There are 2 different points.
Reply @FrankHB
You are not thinking of the real situation of Rocaloid. Rocaloid has two parts: the editor and the renderer.
The editor should just edit the project files. And when it's finished, user can either press the 'Export to WAV...' button to automatically execute the renderer in background, or to close the editor and open a terminal to call the renderer. (especially useful for batch render)
Although the first way is more common, I think that the editor should be separated from the renderer.
The editor has nothing to do with the renderer. If the user ask to preview a small segment of the whole synth project, the editor can just execute the renderer to render the required seconds or minutes.
They do not have to communicate too much.
If we implement it as dynamic libraries, we will be unable to call the renderer individually.
@m13253 Seems that you are trying to apply some Unix philosophy to our project, that's fine. I fully agrees with your recommendations.
On the other hand, I agree with @Sleepwalking 's recommendation on building shared libraries, too. This is due to the consideration on the of the expansibility of our engine. Providing shared library bring embedding our engine into other applications possible, which would surely be great.
So, I think that our application may have three parts:
This is just like Kdenlive, which has a QT-based gui frontend, calling ffmpeg cli tool to render, and which is exactly a CLI frontend to libavutil, libavcodec libraries....
This also leaves space for other GUI developers who prefer to use C/C++ library APIs as communications between frontend and backend rather than pipe or something.
On the other hand, you decided not to depend on Qt for backend, but I still strongly recommend that we should implement a plugin framework for the rendering engine to be extended by other features. We can either reply on the system calls to implement this ourselves, or depend on other open source, cross-platform frameworks.
Agree with @chaserhkj Maybe I was expressing my idea inaccurately. My thoughts of the application is lib+CLI+GUI model.
I don't think we should depend on Qt for backend. I have not been using Qt for development, but I have tried Gtk. If you use such a framework, something of the overall framework of the application would change... I can not express my idea precisely. Anyhow, I do not recommend using Qt for backend, but I strongly recommend Qt for frontend since it is highly portable.
@m13253 There is no contradiction to implement the core API in (dynamic) libraries and keep the editor separate from the renderer. I have not said there should be only one executable program. I've emphasized the call path, which means the back-ends are not necessary to be always run as separate processes. The library provide API allowing developers to make program with UI when needed. Both editor and renderer seen by end-users can be thin wrapper above these libraries. The only responsibility of them are to provide UI. But the editor do not call the renderer directly. Users have freedom not caring about whether they are using a separate process or not when the renderer are called in the first way. For the second way, if opening a terminal to call the renderer is feasible, opening a GUI application to do the same thing should be also feasible. How to make it implemented conveniently for others without API exposed?
Have you thought about real time rendering? If the user had inputted a super long set of notes in the editor, I guess it would take some time to process the intermediate files between the editor and the engine. A CLI based engine would be difficult for real time rendering... You may achieve that through pipes but it would be much more difficult to program than using shared library.
Count five floors above:
If the user ask to preview a small segment of the whole synth project, the editor can just execute the renderer to render the required seconds or minutes.
However I am convinced that there should be a shared library. Thank you for your patience.
@Sleepwalking I don't think using @m13253 's infrastructure would cause much problem on performance or coding... Cause this way of building a system by building a lot of small components and connecting them with pipe, sockets is highly mature on many platforms and are greatly supported by nearly all platforms' APIs and features. This is just a preference, or a so-called philosophy of coding. It won't be complex.
Yet on the other hand, this "Unix-style" of building a system has a highly flexible structure,as every component of the system can be separately take out to use, thus our stuffs can be highly shared with other project, or mixed with others, or have multiple frontends, while we could provide the users with an "official" options of components to build a system so a installation can work out of the box, leave the freedom for those users who would like to choose would surely be great.
@chaserhkj Well, I have no more ideas about this topic since I haven't used the pipes or sockets...
We should quickly reach an agreement. I have too many things to do this summer...
I aggree that: a lib + a CLI + a GUI. Just as @chaserhkj said 7 floors above this floor.
HI Sleepwalking and other developers,
I'm making Qt4 GUI for resamplers - QTau, UTAU-like editor that can use any resampler. Coding was started by Tobias Platen here: https://gitorious.org/lauloid Author is making resampler based on libsms (Spectral Modelling Synthesis from Pompeu Fabra university of Barcelona), and another one based on "World" (Sekai).
That is, if you need UTAU-like GUI for your synthesizer, I'm making one, and as all Qt applications, it supports localization to any language and is cross-platform (works on Windows and Linux atm).
@digited Thank you for your project.
I am not a contributor of Rocaloid (Sleepwalking is), but I would like to help translate your QTau.
I have searched in the database but have not found where the .po
or .ts
stores.
Please give me instructions on how to localize your project.
P.S. sorry I don't have a Gitorious account so I leave my message here.
@digited It's very kind of you to provide the QTau editor for Rocaloid. We've partly finished rewriting Rocaloid in C++ but a decision has been made to pause the process of rewriting and we planed to develop a better synthesis engine instead. The current C++ version of Rocaloid is already able to synthesize (with minor bugs). And we are going to use the same script format for our new engine. So it's nice to collaborate and doesn't matter if the engine is updated.
But I'm not sure about the compatibility between QTau and Rocaloid. The conversion from notes to speech goes through two steps in Rocaloid and its synthesis engine heavily relies on parameters (and requries a massive database currently).
I hope Rocaloid can have its own GUI in the future, but it's OK to use QTau or other GUI currently since we don't have a GUI until the new Rocaloid is finished(which might be half a year later).
@Sleepwalking
its synthesis engine heavily relies on parameters
What parameters? What controls and settings do you need in GUI frontend?
Also I plan to support interface extensions for QTau, so that synthesizers loaded as Qt plugins (crossplatform dll's) can extend user interface with special ui controls that they need. Everything is possible.
QTau is fully open source and fully free, and is a custom software, so anything is possible. Just please describe what do you need from GUI.
@digited
What parameters?
CVS & RSC structure. CVS describes the phonetic details of syllables, such as phonemes, durations, envelope, etc. RSC is like the .vsqx or .vsq files in Vocaloid, or *.ust in Utau. Rocaloid Engine is mainly composed of two parts, the CVE synthesis engine and RSCCommon, which is a converter of RSC and CVS.
What controls and settings do you need in GUI frontend?
I guess the easiest (not the best) way is to start RSCCommon first and then start CVE. You can pass some basic parameters(lyric & notes) to RSCCommon and let it do conversion for you. But this way limits your accessibility to CVS, which is crucial to produce a better song. You may embed RSCCommon to your editor or make changes to the *.cvs file produced by it and that requires much more works to do. Specifics for RSC and CVS files are easy to learn and their are stored as plain text, easy to read/write. I've described those file formats here: http://bbs.ivocaloid.com/thread-115484-1-2.html http://bbs.ivocaloid.com/thread-115503-1-1.html
I can't read Chinese, sorry.
How can I build and test your synthesizer?
@digited Sorry for Chinese...
The newest code is at rgwan's fork: https://github.com/rgwan/Rocaloid The current version is developed on Ubuntu with Anjuta. And the repository is actually an Anjuta project (I used Anjuta 3.9.1). It hasn't been tested on Windows yet. rgwan said he had tried to compile it with MinGW. The compilation was successful but a runtime error occurred, which said a shared library was missing. I guess static compilation could solve that problem. By the way, the original .net-based version is in my Rocaloid 1.6.0 branch. It has same functionalities as the C++ version, the only difference is speed (C++ version is 6 times faster...).
Feel free to ask us any questions about Rocaloid.
@rgwan The command line arguments are your part, please describe them.
Wait for a moment... Some codes are not merged yet. I'm dealing with them now.
@digited Everything is ready. I've merged all the features into my repository. In addition, the "minor bugs" I mentioned were removed. The CLI for CVE (cvecli) is finished but the one for RSCCommon (rsctool) is not. There is a simple snippet for loading, converting and storing those file formats in /RocaloidEngine/src/main.cc
You need a sound database and a dictionary to run Rocaloid, which were attached with the .net version: http://pan.baidu.com/share/link?shareid=540236&uk=3423845838
Here are some cvs & rsc files for test: http://pan.baidu.com/share/link?shareid=3408246916&uk=3423845838
15 kb/s (with my 100 mbit/s connection), and it was cancelled near the end, retrying. Downloading will take some time.
Downloading is cancelled for 4th time, continuing isn't supported. Can you please upload those big files to services outside Chinese net segment?
It's a little bit hard... We're at the same situation... I'll try. Uploading to RapidShare, at a relatively slow but steady speed... It's that OK?
@Sleepwalking Downloaded in a flash, thanks. WIll try soon.
Since you have .NET version, why don't you try to port it to Mono, I just wonder?
QTau looks like this this now:
Needs some more work before publishing. Hope to finish proper gui demo this week. QTau is licensed under WTFPL (http://www.wtfpl.net/about/), so you can use it for any purpose and in any way you like (I'll keep working on it of course).
upd license changed to BSD to avoid offending some fragile souls (that doesn't affect Rocaloid in any way). upd2 obviously no demo "this week", things are changing as I go. It's great that I don't work alone on it now.
Both UTAU and Cadencii (http://vocaloid.wikia.com/wiki/Cadencii) use resamplers as external processes, which requires resampler to do this every launch of synthsizing process:
While there's no reason not to add same functionality to QTau, I'd prefer to use Qt plugins - cross-platform dynamic libraries with extended functionality.
Optimizing disk i/o and extensibility can give QTau an edge over both UTAU and Cadencii, I hope. (besides UTAU being stalled since 2011 and Cadencii requiring .NET/mono or Java, that is)
Qt plugins are compiled with g++, so plugin may wrap any code in C/C++, it can be just a special manager class to your already existing C/C++ synthesizer (even if you use STL, Boost and other non-Qt utils).
All right. I'm going to check it out.
我或许也可以帮忙做一下前端,我的想法是用python+pygame(SDL)来做(我用pygame做过一个比较完善的GUI库)。我建议你把引擎封装成一个DLL,然后开放一些高层接口和底层供python调用。大概就是这样。 btw,我是英语+数学苦手
哦对了,关于音源制作问题我的想法是通过国际音标(IPA)和本地文字(假名,罗马音,汉字,拼音等等)的字典来发音。到后期比较完善时我可以提供正太兼大叔音源供测试 对这项目很看好!
我和 Sleepwalking 的想法是用 S-SAMPA 音标。(Sleepwalking 开始还很担心会不会导致 diphone 组合过多,后来决定做一个本地语言到 X-SAMPA 的兼容层,具体情况还得问他)使用 X-SAMPA 的好处是发音记号兼容 VOCALOID、mbrola 等著名软件。
我也很熟悉 Python。不过我们决定用 Qt 和 C++。不过我们已经有一个编辑器了。
嗯...我知道这是很不规范的做法,Issue Tracker是用来Track Issue的不是用来当论坛聊天的...但是我在iVocaloid论坛上没有发帖权限...直接给开发者发邮件又怕被垃圾邮件过滤...于是我就到这里来发了...
嗯..首先,我觉得自己可以算半个程序员了..学编程大概学了两年左右吧..会C/C++/Python/Go/Javascript, 对Linux和各种开源软件体系都比较熟悉...嗯嗯这是自我介绍了...
然后...我觉得Sleepwalking桑你这个项目做的很棒啊!!! 其实我早就有用初音调教中文歌的想法,但是碍于完全不了解语音学而一直都做不了什么,而且对怎么做逆向工程也是完全不知道所以也搞不定Vocaloid...也是因为平时事情很多,没大块的时间...于是呢,现在我希望能参与这个项目合作...论坛上看到你说GUI和C++苦手...我恰好这方面强一点可以帮忙做做前端开发 ...当然我是觉得我完全做不了后端了(笑)
嗯现在肯定是有这么几个建议:
1.建议还是用C++做开发...C++封装性好,语言相对比较直观,方便做前端开发...不管是开发效率还是运行效率,都相对高一些...其实我是想能用Python做前端肯定最方便..但是出于跨平台考虑,Python要部署Windows 运行环境略坑...
2.建议不要用WxWidget做GUI,改用Qt吧...Qt比起WxWdiget要易学易懂的多...乃说乃学C++时被MFC的Hello World吓到了...其实WXWidget和MFC风格是一样的...而MFC的反人类的API复杂程度世人皆知...与WXWidget比起来Qt就容易学得多...而且Qt也跨平台... 我最熟悉的GUI编程也是用Qt编....
3.关于开源软件协议的事情...我觉得有必要提醒乃一下GPLv3是支持商业使用的..... GPL只是禁止商业公司把代码拿去做闭源软件...如果商业公司拿去修改了之后继续开源,甚至拿来卖钱,只要他提供源代码,那都是不违反GPL的...但是GPL允许散布软件,就是说商业公司拿去卖钱的GPL软件,用户买来后拷贝给别人,或者放在网上分享都还是完全合法的...就是说GPL并不是不允许商业使用...他只是让这个软件没有了被商业使用的意义...另一方面,禁止商业使用并不是开源软件精神推崇的...如果你真的想禁止商业使用的话请不要使用GPL...考虑CC协议吧 ...
4.还是开源软件的事情...关于开源软件的开源声明,一般来说的做法是在根目录下放一个LICENSE.txt 保存协议全文,在放一个 COPYING.txt写简短的版权声明...比如对于GPL短声明就是:
而且一般来说对于每个代码源文件内部也是要用注释附上版权声明的...
嗯 ...基本上就这样...期待能一起合作吧(虽然肯定只能等到期末考试后暑假时间才有时间码代码的说(:3L))