Closed gkapfham closed 1 year ago
Hi @connellyw, you need to make sure that you check off each of the items in the above list, making it clear on what page in your document you have made the improvement that resolves the issue. You also need to answer the questions that are at the bottom of the improvement list. Ultimately, the answers to these questions should influence the final draft of your document. We should plan to meet during office hours to discuss all of these ways in which you need to significantly improve your document in advance of the final deadline.
@gkapfham
What my tool takes as input is the market history data of Bitcoin and Ethereum. The data includes information from September 2019 to the present day. The main data collected to be worked with later includes the time, market open and close price, the volume of the trades, and the highest and lowest price of the coin during that hour. It then breaks the data down into 60-day segments and randomizes them so that there is less of a chance of the model overfitting the data later when the model is applied to live market data. The model attempts to predict three hours in advance of the current time of the live market. The output of the program is taking the model and applying it to live market data that is collected from the Binance API. The strength of the prediction is directly factored into the amount of the coin that the program buys. If the returned value is greater than 0 it means that the program indicates that the price will go up. If the price is less than 0, it means that the coin is going down and the program will sell. The general key components of my tool are shown in the technical diagram below. For the most part, the tool was created and developed myself. I did however draw inspiration from previous projects that I found online for some parts. For instance, when developing the hyperparameter for the RNN, I drew inspiration from similar stock prediction tools. The hyperparameter that I chose often is used in LSTM stock market prediction models. After changing it to fit my needs for my tool I was successful in implementing my own version of it in my tool. Another idea that I used from someone else is implementing the PCA part of my tool. This tool is often used when analyzing the stock market and I decided to create my own version of this in my tool. I felt that this was important to have in a tool like this because a lot of information is being thrown at the user so an easy way to interpret this data must be created. Another part of the tool that I drew heavily from someone else is creating the buy and sell orders. When creating this section of the tool I had to consult many sources on how to successfully conduct these transactions. While this might seem like a straightforward step, Binance requires certain parameters when creating these orders. I had to consult the Binance documentation for the implementation of these orders to determine what I needed to load the function with for it to actually work. Stackoverflow was also vital in troubleshooting the errors that I was getting from the Binance API
My tool is very successful at predicting whether the price of the given coin is going to increase or decrease. It was effective about 60% of the time doing this. Something that my tool is not effective in doing is predicting the precise amount that the coin is going to go up or down. In fact, it does a fairly bad job of this. This is important because it is a factor that goes into the strength of the prediction later on when conducting trades in the live market. Something that I suggest doing for future research is identifying target market segments and training the model. Similar market segments to the current day could lead to better results when trying to predict the price. Also, training the model specifically on price and volume maybe could help the data set be less “confusing” to the model and might return more precise results.
A circumstance that the RNN would be best in would be taking a look at a coin that has fewer fluctuations in its price of it. This is why the model does better training on Bitcoin rather than Ethereum where the price jumps around a lot. These changes confuse the model when trying to predict future market data. Inversely, It would be bad a predicting a coin where there is a massive volume of transactions every hour. Something like Dogecoin where the price can change by thousands of dollars within the hour would lead to poor performance from my tool. That is why I decided to focus on the two main coins that are considered to be the most stable in the field (not including stable coins where the price is intended to not change). These coins are also the best indicators of how the crypto market is doing.
Hi @connellyw, thanks for these responses. Can you please make sure that this content is integrated into the final version of your thesis. Also, I want to point out that this diagram shows phases but it does not seem to show inputs and outputs of these phases. It is also important to note that the diagram is not "balanced" and is moreover lacking in sufficient technical detail. Can you please revise it further so that it addresses these issues?
Also, @connellyw, I have noticed that you did not yet address any of the 15 major revision requirements for your document, at least based on this checklist. Can you please start working on these tasks, checking them off when finished, and then indicating where in your document you made the requested changes! Thanks, I appreciate your proactive attention to detail as you work to finalize your document and ensure that your thesis meets the baseline requirements.
all suggestions addressed
Hello @connellyw, here is the detailed feedback and questions after your thesis defense:
Process Reminders
Here is feedback on your senior thesis chapters. Please bear in mind that you need to resolve all of these issues in advance of submitting the final version of your senior comprehensive project! Here is a reminder of the deadlines for the Spring 2023 semester in CMPSC 610:
Week One (January 16 - 20) through Week Ten (March 20 - 24): Implementation, experimentation, and writing.
March 24 at 11:59 PM: Submit the final draft version as a tagged release in the document's GitHub repository.
Week Eleven (March 27 - 31) through Week Fourteen (April 17 - 21): Oral defense of senior theses.
Week Fifteen (April 24 - April 28) through Week Sixteen (May 1 - May 5): Final revisions to senior thesis document.
May 5 at 11:59 PM: Submit the final version of senior thesis as a tagged release in the document's GitHub repository.
You will note that the end of this issue gives you questions that you must respond to within 24 hours. Please excerpt each of these questions in the issue tracker and then provide a detailed, thoughtful written response. If there is an extenuating circumstance that prevents you from responding within 24 hours, please alert me to the challenges that you face as soon as possible.
Detailed Feedback
Here are additional details that you must consider as you complete the final draft of your senior comprehensive project:
[x] Don't forget that, in addition to finishing the final version of your senior thesis you also need to have a completed version of your computational artifact that meets all of the baseline requirements outlined in the syllabus.
[x] Please make sure that your senior comprehensive project meets all of the baseline requirements outlined in the syllabus. You should use the syllabus as a checklist so as to confirm that you meet every baseline requirement. If you judge that it is fundamentally impossible for you to meet one of these baseline requirements, please meet with me to explain why that is the case.
[x] The thesis document that you submitted still contains template content. You need to remove all of this template content before you submit the final version of your senior thesis. For instance, you need to remove the listing of the tables if you are not going to include any tables in your thesis.
[x] There are formatting mistakes in the current draft of your thesis. Please coordinate with Professor Luman to ensure that all of these issues are promptly resolved. If you have not done so already, you need to review (and most likely accept) the changes in the PR that Professor Luman raised in the GitHub repository that contains your senior thesis chapters.
[x] A lot of the content in the introduction is written in a "personal" fashion instead of in a "professional" fashion. You should rewrite most of this content so that it takes a professional standpoint and includes references to support the claims that you are making.
[x] There a number of sentences in your senior thesis that are written in a way that is not grammatically correct. For instance, you wrote: "Challenges have arouse throughout this research" As it is written, a reader will not be able to parse this sentence. Can you please rewrite it and other sentences in your thesis that do not use correct grammar? You need to make sure that your sentences are clear, syntactically and grammatically correct, and interesting and fun to read. To conclude this point, I want to stress that there are numerous writing mistakes in this document that all must be resolved in advance of the final deadline.
[x] After reading the first few pages of your thesis, it is not clear what it is that you implemented and why this tool is actually needed. Can you please make more clear what are the knowledge and practice gaps and how your system fills those gaps? It is also worth noting that I had the same experience when listening to your presentation: the overall contribution is not very clear.
[x] If possible, you should try to avoid sentences that have one phrase only at either the top or the bottom of a page. For instance, on page 8 there is a case where there is only a single phrase at the top of the page. Sometimes you can fix that by moving around floating elements or adding to a paragraph.
[x] Your thesis has several different technical diagrams that have a different theme to them. Did you create all of these diagrams yourself? If you did, then please make sure to unify their theme (i.e., the colors and the thickness of the lines and other visual aspects of the diagram). If you did not create these diagrams, then do you have permission to use them? Overall, you should aim to create your own original diagrams that have a consistent theme.
[x] The aspect ratio of many of the graphs is not correct. Did you create these graphs? If not, then I encourage you to make your own graphs that explore a specific experimental trend. Overall, all of the graphs inside of the thesis document should be your own intellectual property and of a consistent form in terms of, for instance, aspect ratio, colors, and symbols.
[x] It looks like there are some places in your thesis that do not consistently use the same referencing standard. For instance, you use the reference standard "An approach to predict ..." on page 20 but then use "[1]" on several other pages. Overall, you should use the second approach as it is the standard in computer science and for the thesis document. You also need to make sure that you include the reference, like "[8]" and then place the period after the reference. To be clear, you need to have a consistent and correct format for all references across the entire thesis document!
[x] Your thesis uses words like "complexity" and "scalability" without a clear definition of these terms. Can you please make sure that every one of the evaluation metrics that you mention is first clearly defined?
[x] Can you add more content to your thesis about the steps that a person would take if they want to install your tool? How many steps are involved in this process? What are the software requirements associated with using this tool? Overall, the thesis needs more details about these issues. It is also important to note that the technical diagram in Figure 9 is poorly formatted and lacking in suitable details; please heavily revise it.
[x] It is not possible to read any of the graphs that you have included in your thesis. Please make sure to change the theme, font size, and other aspects of the graphs so that they can be easily read. I should also point out that every metric that you visualize through a graph needs to have an associated definition through an equation in your thesis. For instance, what is the precise meaning of "accuracy". I see that your thesis references, for instance, MSE, without giving an equation and/or a reference that precisely defines what is the MSE and how it is calculated and used in your research.
[x] The thesis needs more details about the threats to the validity of the experimental results and the limitations of the presented technique. Can you please add more details about these issues? Finally, you can use this content as a springboard for additional material in your final chapter that is about the ways in which future researchers can extend your project. Thanks!
Project Questions
What are the inputs, outputs, and behaviors of your tool? While I acknowledge that you have explained some aspects of your tool, can you please include a technical diagram that shows its key components? Finally, what are the aspects of the tool that you implemented and which are those that are reused?
What are the ways in which you tool does (and does not) work effectively? For the ways in which you tool does not work, can you please outline a plan that you would suggest for resolving these issues?
What are circumstances in which you tool would (and would not) make good predictions and recommendations to someone who wanted to make investments in the cryptocurrency market? Why would it make these predictions and recommendations?
Note: Your response to the second and third question must contain a clear statement of the evaluation metrics by which you would measure words like "effective" and "ineffective" and "good" and "bad".